Hacker News new | ask | show | jobs
by alonmln 1633 days ago
Cool, it's impressive how much can it do with a short sample, although this seems like an easy way for end users to deep fake their friends / enemies saying something.
3 comments

Currently we’re looking at possible solutions, see for example here[1]. If you have suggestions, feel free to chime in!

In the demo we specifically disallowed bulk uploads to hinder such abuses.

[1] https://github.com/coqui-ai/TTS/discussions/1036

I tested it with your comment: https://sndup.net/mghy/ :)

It's also a new possibility to somewhat personalize the text to speech engines. The above example is not really close to my voice.

Maybe the solution is to have a randomly generated paragraph of text to read which expires in short amount of time. So you can't predict it and you don't have enough time to splice together a fake reading from something else.
The problem with any anti abuse measure is someone can create another project which does not have any of this. There are a handful of projects which can do pretty good voice synthesis right now. It would be about as easy as getting a consensus for all photo editing tools to place a watermark on the image to prevent abuse.