|
|
|
|
|
by trevyn
3065 days ago
|
|
"Deep Voice 2 can learn from hundreds of voices and imitate them perfectly. Unlike traditional systems, which need dozens of hours of audio from a single speaker, Deep Voice 2 can learn from hundreds of unique voices from less than half an hour of data per speaker, while achieving high audio quality." - http://research.baidu.com/deep-voice-2-multi-speaker-neural-... Now imagine that with Tacotron quality, and you'll get that "strange" effect with anyone, meeting their vocal clone. This is still text-to-speech, so it's not live-copying your intonation, but you could easily imagine a seq2seq network designed to do so. |
|