Hacker News new | ask | show | jobs
by quiteawhile 3228 days ago
I couldn't read the paper yet, and also I know very little about this, but listening to the audio samples it seems that one of the most notable changes was the intonation in changing phrases. Did anyone else catch something like that? I'm not sure I'm doing a good job at explaining. If you listen to all iOS11 samples it'll stand out.

Anyway, it's the only way I can still identify this as a fake voice. The intonation always follows the same cadence (not sure if that's the word?). We really shouldn't have overused the word awesome before this kind of thing came along.

There's also a kind of dread too, tbh, this kind of seamless TTS has the potential to change a lot of things. First of all criminals are going to love this, youtube pranksters too. Eventually this will shake up the voice acting industry in a possibly not healthy way for the voice actors, while at the same time allowing projects with a shorter budget to have incredible voice work (also dubbing).

What I think is really important, tho, is that as we move away from the uncanny valley we change our relationships with those voices, our brains don't have the capacity to listen to a voice this real and not imagine it as a person, even for adults.

Ironically at this moment I'm using an old threadless sweatshirt that says "this was supposed to be the future" but nowadays I can honestly say we're getting there.

2 comments

Regarding voice acting, I think there is something to be said about human expression/ad-lib. Sure, you could generate a natural-sounding voice computer voice, but in the context of arts we’re still a ways to go before a computer can go off script and add just the perfect amount of intonation on a certain word that turns a phrase into an iconic quote.

Similarly, we don’t see CGI motion capture replacing Andy Serkis any time soon.

I think this is less likely to hit major films or TV shows, but it will hit the audiobook and video game markets pretty hard.

I'm pretty excited about the video game side.

I think you're overstating things. On the one hand, a lot of applications where quality wasn't that critical switched over ages ago. And, on the other hand, any application that would have spent the money on voice acting is still going to pay for both the higher quality and for a sound that isn't the same as everyone else is using. (Note that Siri's new iOS voice is based on a new training set from a new person.)

I do think there are applications that we don't just have today because TTS just isn't good enough. I've had some ideas around Alexa apps related to content that would be TTSd. But the current Polly just isn't human enough. I don't think this is there yet either but it's getting close.