Hacker News new | ask | show | jobs
by zamadatix 385 days ago
The (American English) voices are absolutely amazing but the tags for laughs still feel more like an "inserted dedicated laugh section" than a "laugh at this point in speaking" type thing. I.e. it can't seem to reliably know when to giggle while saying a word, "just" giggle leading up to a word.
2 comments

They're also still too expensive, and that's creating a lot of opportunity for other players.

Even though ElevenLabs remains the quality leader, the others aren't that far behind.

There are even a bunch of good TTS models being released as fully open source, especially by cutting-edge Chinese labs and companies. Perhaps in a bid to cut off the legs of American AI companies or to commoditize their compliment. Whatever the case, it's great for consumers.

YCombinator-backed PlayHT has been releasing some of their good stuff too.

What would say are some of the best open source TTS - chatterbox maybe?
could you list 2 or 3 of the ones you think are best quality to $?
Kokoro is the best open TTS I've tried.
If you edit the text so that laugh makes sense in the context it should be much more natural like this one: https://x.com/elevenlabsio/status/1930689782331412811
The first laugh in that "<LAUGHS> Hey, Dr. Von Fusion" is a dedicated laugh section, which the model does extremely well, but it works because that's a natural place to laugh before actually speaking the following words. Skip ahead to "...robot chuckle. Jessica: <LAUGHS> I know right!" and you get an awkwardly time/toned light chuckle completely separated from the "I know" you'd naturally continue saying while making that chuckle.

You can always rewrite the text to avoid times where one would naturally laugh through the next couple of following words but that's just attempting to avoid the problem and do a different kind of laugh instead.

She is laughing through the "I know", though.
have to say that this human can't tell the difference between this and other real humans so...