This is a bit different. These audio clips use the default voice of each of these systems. I was asking about zero-shot voice cloning, i.e. transferring a recorded voice and synthesizing speech in that voice.
I tried zero-shot voice cloning in all of the top OSS models in the Arena and performance was bad.
Thanks for the flag. VoiceCraft is indeed the best ZS OSS voice cloning tool, despite appearing at the bottom of the TTS arena They have a really easy-to-use gradio demo on their repo if anyone else wants to give it a try.
There is still a big gap between 11Labs and Character.ai and the VoiceCraft voices would not be confused for the real speaker, but this is much closer.
I tried zero-shot voice cloning in all of the top OSS models in the Arena and performance was bad.