Hacker News new | ask | show | jobs
by ax8080 1235 days ago
WOW! What do you use to generate voice? It's SO scary similar to real podcasts. I couldn't find it in a minute in the sources.
4 comments

and it's funny how sometimes it makes ahhrhrhrhrhhhhh sounds what is the reason behind that?
It happens quite often with TorToiSe that it collapses in this way. Especially for unseen tokens that wouldn't have appeared in the training data, which likely consisted of a lot of transcribed stuff and read text like audio books. Trying to make it laugh by prompting it with "hahaha" (which you won't really see in mentioned data) often gets you demon and zombie noises.
I wonder why the title says that it uses Tortoise TTS?

Also interesting that play.ht allows you to clone others' voices.

How did they get to use the joe rogan voice though? It seems that one isn’t public?
It uses the TorToiSe TTS model for generation. It's simple to generate conditioning voice latents using short audio samples. Likely transcribed JRE episodes were part of the TorToiSe training data, explaining how it's so good at recreating his voice characteristics in particular.
That generation uses tortoise-tts. Play.ht has a model called peregrine, I've taken to using a script to call them out. Super cool company & API. I just haven't had time to get my next gen version out.
Play.ht