Hacker News new | ask | show | jobs
by miki123211 885 days ago
If you think OpenAI's TTS is impressive, you should check out Eleven Labs. They have the highest quality models IMO. Voice quality, emotional awareness / inflection and support for foreign languages are top-notch, it's that last point that OpenAI seems to have the most issues with. If you find a good voice to clone, the latest models can even replicate somewhat unusual accents and speaking styles.

For plain old English TTS with a stock voice, there isn't that much of a difference (although Eleven Labs still wins IMO), but if you need either voice cloning or foreign language support, nothing else comes even close.

With that said, Eleven is extremely pricy, something like Azure TTS (which is the best among the cheap options) may be a better fit for less demanding applications.

2 comments

The quality difference between Eleven and OpenAI is IMO pretty small, but the price difference is enormous: for 50,000 characters (approx 1hr of audio, by Eleven's estimates), you'd pay Eleven Labs $9 assuming you're in their highest $330/month payment commitment tier; for OpenAI there's no minimum commitment and the same number of characters would cost $0.75.

If you're generating speech once and replaying it many times (e.g. making podcasts), the difference is negligible and you might as well go with Eleven Labs, since it's more customizable and possibly slightly higher quality. If you're doing interactive speech with customers, $9/hr is incredibly expensive (higher than hiring a minimum-wage worker in the U.S.!), and OpenAI's TTS is a very close second best and much more reasonably priced. If you're trying to integrate speech into an AI product, Eleven makes your hourly costs pretty unfeasible since you have to at minimum charge your customers more than it costs to hire a human being to do a task.

Azure's "Neural" line of TTS is the best of the big cloud offerings, but it's pretty mediocre compared to either OpenAI or Eleven Labs IMO. And it's actually more expensive than using OpenAI: it's $0.80 for 50,000 characters (~1hr), unless you're willing to commit to over $1k monthly spend, at which point it's barely cheaper than OpenAI at $0.64 per 50k characters.

OpenAI's TTS is IMO the best option for anything interactive, since it's so much higher quality than Azure's Neural TTS and so much cheaper (with very little quality difference) as compared to Eleven Labs.

For anyone reading, in case you want a whole order of magnitude cheaper, just go with Google Cloud TTS. For many voices, you get 1 million characters free per month, and even beyond that it's ridiculously cheap. Some voices do sound artificial, but many sound quite human - the only tells are the relatively consistent tone and section ends (no appropriate pauses).

I don't read long articles any more. I have a script that extracts the text, does TTS via Google Cloud, and adds it to my podcast so I can listen to it while driving. Been doing this for months and haven't paid a cent.

Azure has a half-million character free tier for their top-quality "Neural" voices, which I find somewhat better than Google Cloud's top tier of voices ("Studio" voices). For personal use you can probably just use Azure for free too!

If you're running a business you'll probably burn through the free tiers of either of them, and Google is wayyy more expensive — roughly $8/hr for Studio voices using the 50k characters per hour estimate. The "Neural2" voices are competitively priced with OpenAI and Azure, but are pretty low quality compared to even Azure (and much worse than OpenAI).

That's a good suggestion, thank you. Would it be possible to post some code? I've found GCP's APIs/documentation to be a bit abstruse.
Actually, all the code related to this was copied from the docs, almost verbatim.

(But yes, their docs suck).

None of their available voices are as good as ms
ms?
MS = Microsoft (presumably the Azure Neural TTS, which I agree is better than Google's TTS, although worse than OpenAI).
Maybe I’m not a good judge but OpenAI’s voices sound very natural to me and seem better than Eleven labs.