
AI, Deepfakes, and the Case for Presence Verification
Jun 5, 2026

The identity challenge in enterprise security is about to get significantly harder.
Synthetic media, deepfake audio, cloned voices, and AI-generated video are no longer theoretical threats. It is a practical one. The tools required to clone a voice convincingly are freely available, require only seconds of source audio, and produce output indistinguishable to the human ear. Real-time deepfake video, until recently the preserve of well-funded state actors, is now accessible to motivated individuals with standard hardware.
Most enterprise communication infrastructure has no mechanism to detect any of it.
What Has Changed
The synthetic media threat is not new. What is new is the scale, accessibility, and real-time capability of the tools now available.
Voice cloning, in particular, has undergone a step change. In 2020, producing a convincing voice clone required significant training data and specialist capability. In 2026, it requires a short audio sample, a voicemail, a recorded meeting, a public interview, and a consumer-grade tool. The resulting output can be deployed in real time, over a phone call, in a voice note, or embedded in a live video feed.
The consequences for enterprise security are significant and underappreciated. Business Email Compromise (BEC), already responsible for billions in losses annually, is now being augmented with voice and video. Attackers who previously impersonated executives via email alone can now layer in audio that sounds indistinguishable from the target's voice. Finance teams receive calls that sound exactly like their CFO authorising a transfer. Legal teams receive voice messages from people who sound exactly like their partners. And increasingly, those calls are being reinforced with video.
The attack is no longer a phishing email with a suspicious link. It is an entirely synthetic conversation, conducted in real time, with a person who does not exist.
Why Enterprise Communications Are Particularly Exposed
The structure of enterprise communications creates a specific vulnerability to AI-enabled impersonation.
Most platforms verify identity once, at login, and then trust the session. Once a user is authenticated, the assumption is that the person in the meeting, on the call, or sending messages is the person who authenticated. There is no further check. No ongoing verification. No mechanism to confirm that the voice on the call belongs to the face that logged in, or that the face on the video belongs to the person whose credentials were used to join.
This model was adequate when the only thing an attacker could do was steal credentials. If you could prove you had the right password, you were trusted as the right person. The session layer was irrelevant because there was no practical way to join a session without being the authenticated user.
Synthetic media breaks that assumption entirely. An attacker with a voice clone and a deepfake video feed can be present in a session, both audibly and visibly, while the authenticated user is elsewhere entirely. The session layer is no longer irrelevant. It is the attack surface.
Why Detection Is Not the Answer
The instinctive response to the deepfake problem is detection: build tools that can identify synthetic media and flag it in real time.
This is a reasonable instinct, but it is the wrong frame.
Deepfake detection and deepfake generation are in a continuous arms race, and the generator has a structural advantage. Detection tools are designed to identify artefacts from current-generation methods. Generation tools evolve to eliminate those artefacts. Every improvement in detection capability drives a corresponding improvement in generation capability. The lead alternates, but the trend is clear: generation is outpacing detection, and the gap is widening.
There is also a practical problem. Detection tools produce false positives. In an enterprise environment, a system that flags legitimate communications as synthetic at any meaningful rate is operationally unusable. The costs of false positives, disrupted communications, eroded trust in the system, and user workarounds rapidly exceed the benefits of catching genuine attacks.
But the more fundamental issue is that detection is asking the wrong question. The question is not "is this communication synthetic?" The question is "Is this person actually present?"
These are different questions, and only one of them has a reliable answer.
Presence Verification Changes the Frame
Presence verification does not try to detect what is fake. It confirms what is real.
Continuous facial recognition with liveness detection and depth verification does not analyse a video stream for deepfake artefacts. It confirms that a specific, verified individual is physically present, in three dimensions, in real time, in front of the device. It confirms presence. And if presence cannot be confirmed, the session does not proceed.
This approach sidesteps the detection arms race entirely. It does not matter how convincing a deepfake is, or how accurately a voice clone reproduces someone's speech patterns. If the verified individual is not physically present in front of the device, presence verification fails. The attack fails with it.
The distinction matters because it is architecturally durable. Detection-based approaches require constant updates as generation methods evolve. Presence-based approaches do not compete with generation tools; they are asking a different question that generation tools cannot answer.
For enterprise communications, this is the credible response to AI-enabled impersonation. Not trying to identify synthetic media after the fact. Confirming that the right person is actually there before the session proceeds and continuously throughout it.
What This Means for Enterprise Security Teams
The practical implication for security and compliance leaders is straightforward: the threat model for enterprise communications needs to be updated.
The assumption that authenticated credentials equal a trusted presence has always been imperfect. AI-enabled impersonation makes it untenable. A communication security architecture that cannot answer the question "is this person actually here?" has a gap that is now being actively exploited.
Closing that gap does not require replacing existing authentication infrastructure. Presence verification sits alongside it — at the session layer that login authentication was never designed to address. It adds the one check that the rest of the stack cannot perform: continuously confirming that the verified user is still present.
As synthetic media becomes easier to produce and harder to detect, that check is no longer optional for regulated industries. It is the baseline.
To see how continuous presence verification integrates into your communications infrastructure, book a call with the YEO team or explore the YEO CFR SDK.



