| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zamadatix 385 days ago
	The (American English) voices are absolutely amazing but the tags for laughs still feel more like an "inserted dedicated laugh section" than a "laugh at this point in speaking" type thing. I.e. it can't seem to reliably know when to giggle while saying a word, "just" giggle leading up to a word.

2 comments

echelon 385 days ago

They're also still too expensive, and that's creating a lot of opportunity for other players.

Even though ElevenLabs remains the quality leader, the others aren't that far behind.

There are even a bunch of good TTS models being released as fully open source, especially by cutting-edge Chinese labs and companies. Perhaps in a bid to cut off the legs of American AI companies or to commoditize their compliment. Whatever the case, it's great for consumers.

YCombinator-backed PlayHT has been releasing some of their good stuff too.

link

taf2 385 days ago

What would say are some of the best open source TTS - chatterbox maybe?

link

jsemrau 385 days ago

I had good results with Nemo + xTTS_v2

https://docs.nvidia.com/nemo-framework/user-guide/latest/nem...

https://huggingface.co/coqui/XTTS-v2

link

monkeywork 385 days ago

could you list 2 or 3 of the ones you think are best quality to $?

link

stavros 384 days ago

Kokoro is the best open TTS I've tried.

link

lharries 385 days ago

If you edit the text so that laugh makes sense in the context it should be much more natural like this one: https://x.com/elevenlabsio/status/1930689782331412811

link

zamadatix 385 days ago

The first laugh in that "<LAUGHS> Hey, Dr. Von Fusion" is a dedicated laugh section, which the model does extremely well, but it works because that's a natural place to laugh before actually speaking the following words. Skip ahead to "...robot chuckle. Jessica: <LAUGHS> I know right!" and you get an awkwardly time/toned light chuckle completely separated from the "I know" you'd naturally continue saying while making that chuckle.

You can always rewrite the text to avoid times where one would naturally laugh through the next couple of following words but that's just attempting to avoid the problem and do a different kind of laugh instead.

link

stavros 384 days ago

She is laughing through the "I know", though.

link

Davidzheng 385 days ago

have to say that this human can't tell the difference between this and other real humans so...

link