Hacker News new | ask | show | jobs
by ryanbrunner 3065 days ago
I was about to say that those clips still registered as a computer quite easily to me, until I got to the comparison with a human voice.

I think I've just gotten so used to that voice as the "google" voice that I automatically associate it with computers. It would be strange to meet the human that was providing the human voice in those samples.

2 comments

"Deep Voice 2 can learn from hundreds of voices and imitate them perfectly. Unlike traditional systems, which need dozens of hours of audio from a single speaker, Deep Voice 2 can learn from hundreds of unique voices from less than half an hour of data per speaker, while achieving high audio quality." - http://research.baidu.com/deep-voice-2-multi-speaker-neural-...

Now imagine that with Tacotron quality, and you'll get that "strange" effect with anyone, meeting their vocal clone.

This is still text-to-speech, so it's not live-copying your intonation, but you could easily imagine a seq2seq network designed to do so.

I had a very strange experience like that recently when listening to Radio 3, the BBC's mostly-classical channel. They had an opera programme with guest presenters from the Met Opera in New York. The usual BBC presenters of course have British accents, and one of these American presenters had a particular accent that my brain latched onto as matching the sound of synthetic speech. I just could not suspend disbelief and convince myself that this speech - which rationally of course I knew was human - was that of a real person rather than some sort of AI assistant. It was a very strange feeling.

I did have a fever at the time, which might not have helped.

Maybe that person was the human source of a voice you use in text-to-speech in your GPS perhaps or book reading app?