Hacker News new | ask | show | jobs
by dmckinno 625 days ago
This is a bit different. These audio clips use the default voice of each of these systems. I was asking about zero-shot voice cloning, i.e. transferring a recorded voice and synthesizing speech in that voice.

I tried zero-shot voice cloning in all of the top OSS models in the Arena and performance was bad.

1 comments

Most of those models DO do zero shot cloning. The best is VoiceCraft. It's nearly 11Labs quality. Check it out.
Thanks for the flag. VoiceCraft is indeed the best ZS OSS voice cloning tool, despite appearing at the bottom of the TTS arena They have a really easy-to-use gradio demo on their repo if anyone else wants to give it a try.

There is still a big gap between 11Labs and Character.ai and the VoiceCraft voices would not be confused for the real speaker, but this is much closer.