Hacker News new | ask | show | jobs
by claytonius 2380 days ago
I think you're right that most DeepFakes use Variational AutoEncoders (As described by the article) where the latent space is jointly distributed between the two domains, allowing for translation. For additional context: AutoEncoders are also frequently adversarially regularized so you're not wrong about GANs being involved. My understanding is that the most accurate implementations also have some manual feature engineering for more tractable problems like facial orientation.

One of my favorite related papers: https://arxiv.org/pdf/1703.00848.pdf