Hacker News new | ask | show | jobs
by LegendaryPatMan 3341 days ago
This is pretty basic at the moment and it's terrifying. Yeah, it has an MS Sam feel to it, but as the tech improves and we know it will, you could use a service like this to put words in someone's mouth. Think about how you could trip up a CEO or a Politician by playing some random clip that they never said. When that gets into the Zeitgeist judgments will be made in the court of public opinion devoid of facts or real evidence. You could destroy democracy or people's lives with technology like this
3 comments

I actually have somewhat of an opposite opinion on this. As HN readers and being "in" the cutting edge front of tech, we know that things like this is possible (I first learned of this seeing Adobe demo it a while ago), but this is not mainstream knowledge yet.

The sooner we can get to a point where everybody knows stuff like this (voice impersonation) is possible, the sooner we can avoid real damages (of courts mis-judging with an impersonated voice recording as accepted evidence).

Yes, we lose an entire area of evidence that can be used in court (all voice recordings, possibly), but the tech was going to get here sooner or later and it was going to be a problem we'd have to deal with. I'd rather be at a place where everyone knows voice recordings are unreliable, than actually having harm done because of impersonated voices because people didn't think it was possible.

Avoiding court misjudgment is reasonably possible.

How we're going to fight against people believing whatever sound bytes from fake news they want to believe is a harder question...

I don't think the tech will improve fast. I've been watching speech synthesis since the 80s, and progress hasn't accelerated over that time.

Speech synthesis is one of those 90% problems - when you're 90% done, you find you only have 90% left to do.

This level of synthesis is relatively easy. Getting to the 'Can reliably pass for the real thing" level is going to take a huge amount of extra work.

It's not even about computational power - it's about the sophistication of the models, and their ability to parse words into phonemes correctly with some knowledge of social and linguistic context.

"Good enough for some applications" - like phone switchboard systems - is a simpler problem. Virtual impersonation is very much harder.

I was pretty impressed by fake Obama's voice. Obviously it doesn't stand up to close scrutiny, but I think if I heard it playing in the background, I could be fooled. And the biggest giveaway was occasional weird intonation rather than the timbre of his voice. All they have to do is make it to where you say a sentence, and it matches your intonation with the other person's voice.
I think you over estimate the complexity and required work to get to virtual impersonation. This will be a problem sooner than you think.
There are human impersonators already. I suppose it's not that easy to fake a visible, high-ranking person for long.
Individual impersonators are not the threat. It's the glut of impersonators that will present the real challenge. It would be very helpful to see a study done with these platforms as they mature to determine what percentage of the population is more easily fooled by these.

For example, as an individual with hearing problems, I may not be so easily able to determine a synthesized recording from an actual recording - for a short period of time. With longer recordings it may become more obvious.

Yes, but imagine a human impersonator who has infinite time to take requests from anyone and generate free recordings of any person with a substantial online audiovisual presence.
Bad joke of the day: Even Trump can't do it, and he really is President of the U.S.!