Interestingly Zelensky might be easier to deepfake than other world leaders because there's more footage and a wider variety of facial expressions to train on due to his acting career
Might not be so obvious to a soldier who has slept one hour a night for the past two weeks, eaten only canned corn and tuna, hasn't gotten ammo replenishment in days, and is watching this video on a tiny cracked screen with a dying battery with the volume very low as the internet connection fades in and out.
That's true for now, but eventually, as the technology progresses, it will take very little training data to create deepfakes of anyone unfortunately. Probably just a minute or two of video.
Is this true? I don't see much evidence that the technology is progressing in a direction where models will require less training data, if anything the trend seems to be towards models with higher and higher parameter counts.
Better models are coming out which are already pretrained on a significant amount of data, so the model already learned a lot about what is common to all example of video generation (keeping the edges aligned coherently at every frame, keeping texture and lighting coherent etc.) and will not need to re-learn that for every target.
Since initially, deepfake models were trained from scratch for every single target, you had to provide a lot of data from the person you want to target so that the model can learn what is common as well as what is specific.
Now you can get descent performance with much less data, since you only need to learn the specifities.
However, this only helps if you need a limited deepfake: The model cannot infer the exact facial expression of the target when they are, for example, laughing unless you provided an example of that in the training data (assuming there is no way to infer the laughing expression from someone by looking at other provided expressions). It will instead generate a generic laugh. All missing informations are substituted by what was seen, on average, in the pre-training phase.
That wouldn't work for a long complex deepfake meant to be sent to someone reasonably close with the target.
But for the types of deepfake where it's targeting a personality that we all know, but not very well at all, much less data is neeeded than before for a similar result.
At least in my experience - audio is much harder to convincingly fake than video. If you have heard the real person speaking, they have very specific and distinguishable patterns of speech.
You can fake it reasonably, but you need to have a very large collection of audio clips to do so, and if you do a bad job it literally jumps out at the viewer.
Video might be off, but it requires close attention and large screens to notice - much easier to miss if you're viewing on a phone.
This is called few-shot learning (or in the limit case, with a single example, one-shot learning). One way this may be achieved by first training a very general model (maybe with huge data sets), and then fine-tuning it into a specific example (this is called transfer learning). [0]
One reason researchers suspected this must be possible is that human beings, as well as other animals, can learn stuff by watching it for just a few seconds. But we have some prior baggage, because we spent our whole lives learning.. other, vaguely related stuff, and it turns out that knowledge is often transferable.
[0] This isn't the only way, there's also meta learning
I think the idea is that data needed to imitate a _specific person_ would require less data. Overall a model like that might have orders of magnitude more data in general and maybe a smaller amount required to imitate the features of a particular one.
I imagine it like - A master cabinet maker can make new variations of cabinets super easily once they've made hundreds of similar cabinets?
The real threat I think will not be in trying to trick people with fake videos, but instead deep flags: create fake videos showing awful things about the politician one wants to win, and then attribute the fakes to supporters of the guy you want to lose and make them look like animals.
Especially in coordination with a media campaign, it would be an effective way of undermining the opponent by making them look simultaneously grossly unethical and desperate. Even better if you can get the FBI involved to investigate whether the video was created directly by the opponent or "only" by his supporters. And anybody describing what actually happened could easily be straw manned into sounding like a nutter.
And the quality is already perfectly fine for this.
Not to mention, while it could be hard for random online trolls, certain state actors in your country of residence likely already have a ton of your face stored on video, celebrity or not.