Hacker News new | ask | show | jobs
by TheOtherHobbes 3341 days ago
I don't think the tech will improve fast. I've been watching speech synthesis since the 80s, and progress hasn't accelerated over that time.

Speech synthesis is one of those 90% problems - when you're 90% done, you find you only have 90% left to do.

This level of synthesis is relatively easy. Getting to the 'Can reliably pass for the real thing" level is going to take a huge amount of extra work.

It's not even about computational power - it's about the sophistication of the models, and their ability to parse words into phonemes correctly with some knowledge of social and linguistic context.

"Good enough for some applications" - like phone switchboard systems - is a simpler problem. Virtual impersonation is very much harder.

2 comments

I was pretty impressed by fake Obama's voice. Obviously it doesn't stand up to close scrutiny, but I think if I heard it playing in the background, I could be fooled. And the biggest giveaway was occasional weird intonation rather than the timbre of his voice. All they have to do is make it to where you say a sentence, and it matches your intonation with the other person's voice.
I think you over estimate the complexity and required work to get to virtual impersonation. This will be a problem sooner than you think.