| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ax8080 1283 days ago
	WOW! What do you use to generate voice? It's SO scary similar to real podcasts. I couldn't find it in a minute in the sources.

4 comments

ax8080 1283 days ago

and it's funny how sometimes it makes ahhrhrhrhrhhhhh sounds what is the reason behind that?

link

nielsinho 1282 days ago

It happens quite often with TorToiSe that it collapses in this way. Especially for unseen tokens that wouldn't have appeared in the training data, which likely consisted of a lot of transcribed stuff and read text like audio books. Trying to make it laugh by prompting it with "hahaha" (which you won't really see in mentioned data) often gets you demon and zombie noises.

link

carlbarrdahl 1282 days ago

It's making an api request to play.ht:

https://github.com/yacineMTB/scribepod/blob/master/playht.ts...

link

windsignaling 1282 days ago

I wonder why the title says that it uses Tortoise TTS?

Also interesting that play.ht allows you to clone others' voices.

link

tehsauce 1282 days ago

How did they get to use the joe rogan voice though? It seems that one isn’t public?

link

nielsinho 1282 days ago

It uses the TorToiSe TTS model for generation. It's simple to generate conditioning voice latents using short audio samples. Likely transcribed JRE episodes were part of the TorToiSe training data, explaining how it's so good at recreating his voice characteristics in particular.

link

yacine_ 1282 days ago

That generation uses tortoise-tts. Play.ht has a model called peregrine, I've taken to using a script to call them out. Super cool company & API. I just haven't had time to get my next gen version out.

link

qup 1282 days ago

Play.ht

link