Hacker News new | ask | show | jobs
by trevyn 3065 days ago
"Deep Voice 2 can learn from hundreds of voices and imitate them perfectly. Unlike traditional systems, which need dozens of hours of audio from a single speaker, Deep Voice 2 can learn from hundreds of unique voices from less than half an hour of data per speaker, while achieving high audio quality." - http://research.baidu.com/deep-voice-2-multi-speaker-neural-...

Now imagine that with Tacotron quality, and you'll get that "strange" effect with anyone, meeting their vocal clone.

This is still text-to-speech, so it's not live-copying your intonation, but you could easily imagine a seq2seq network designed to do so.