I'm guessing that finding a technology to try to detect this would be over-engineering. I'd love to see a sample where the person with the swapped face passes their hand with spread fingers over their face, and see how it handles that.
Perhaps we’ll see it a requirement to use a closed platform like an iPhone where it would be much easier to attest that the feed is not tampered with.
It’s already a requirement sometimes to take a video of your face from multiple angles using your phone - some identity verification service forced me to do it.
I imagine that stuff like this will evolve to check for hardware attestations more, or use info from depth/lidar sensors to verify video and other sensor data align.