Hacker News new | ask | show | jobs
by wczekalski 950 days ago
have you tested longer utterances with both ElevenLabs and with StyleTTS? Short audio synthesis is a ~solved problem in the TTS world but things start falling apart once you want to do something like create an audiobook with text to speech.
1 comments

I can say that the paid service from ElevenLabs can do long form TTS very well. I used it for a while to convert long articles to voice to listen to later instead of reading. It works very well. I only stopped because it gets a little pricey.
The OpenAI API is ten times cheaper and a fair bit faster.

Also, ElevenLabs keeps diverging for me, and starts mispronouncing words after two or three sentences.