Hacker News new | ask | show | jobs
by fpgaminer 3571 days ago
I'm guessing DeepMind has already done this (or is already doing), but conditioning on a video is the obvious next step. It would be incredibly interesting to see how accurate it can get generating the audio for a movie. Though I imagine for really great results they'll need to mix in an adversarial network.
1 comments

Oh yes, extract voice and intonation from one language, and then synthesize it in another language -> we get automated dubbing. Could also possibly try to lipsync.