Hacker News new | ask | show | jobs
by demonictoaster 1996 days ago
The security implications of this kind of tech are scary. Going forward it will become really easy to reproduce the voice of anyone! It seems not a lot of training data is required to achieve reasonable results (e.g. Spong Bob is just 27min of voice, Half Life Black Mesa Announcer is just 1.9min!!). This stuff could be easily leveraged for scams and deep fakes (along with deep learning models that could also tweak lip movements to match the voice for example). Thankfully, there is also a very active area of research that leverages similar tech to detect deep fakes.
2 comments

These kinds of discussions are common with articles about deep fake video and audio. While I do not disagree with your point, here are two quick thoughts:

- We have had perfect image manipulation capabilities for quite some time now. We have had written text manipulation capabilities for hundreds of years.

- People will continue to believe what they believe, whether there is deep fake video and audio or not.

Agree with you. Hopefully people are more and more aware that they cannot trust anything out there. We are soon reaching a point where we can make anyone say anything we want, including in audio and video format.
It's already happening:

A Voice Deepfake Was Used To Scam A CEO Out Of $243,000:

https://www.forbes.com/sites/jessedamiani/2019/09/03/a-voice...