| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by reissbaker 890 days ago

The quality difference between Eleven and OpenAI is IMO pretty small, but the price difference is enormous: for 50,000 characters (approx 1hr of audio, by Eleven's estimates), you'd pay Eleven Labs $9 assuming you're in their highest $330/month payment commitment tier; for OpenAI there's no minimum commitment and the same number of characters would cost $0.75.

If you're generating speech once and replaying it many times (e.g. making podcasts), the difference is negligible and you might as well go with Eleven Labs, since it's more customizable and possibly slightly higher quality. If you're doing interactive speech with customers, $9/hr is incredibly expensive (higher than hiring a minimum-wage worker in the U.S.!), and OpenAI's TTS is a very close second best and much more reasonably priced. If you're trying to integrate speech into an AI product, Eleven makes your hourly costs pretty unfeasible since you have to at minimum charge your customers more than it costs to hire a human being to do a task.

Azure's "Neural" line of TTS is the best of the big cloud offerings, but it's pretty mediocre compared to either OpenAI or Eleven Labs IMO. And it's actually more expensive than using OpenAI: it's $0.80 for 50,000 characters (~1hr), unless you're willing to commit to over $1k monthly spend, at which point it's barely cheaper than OpenAI at $0.64 per 50k characters.

OpenAI's TTS is IMO the best option for anything interactive, since it's so much higher quality than Azure's Neural TTS and so much cheaper (with very little quality difference) as compared to Eleven Labs.

1 comments

BeetleB 889 days ago

For anyone reading, in case you want a whole order of magnitude cheaper, just go with Google Cloud TTS. For many voices, you get 1 million characters free per month, and even beyond that it's ridiculously cheap. Some voices do sound artificial, but many sound quite human - the only tells are the relatively consistent tone and section ends (no appropriate pauses).

I don't read long articles any more. I have a script that extracts the text, does TTS via Google Cloud, and adds it to my podcast so I can listen to it while driving. Been doing this for months and haven't paid a cent.

link

reissbaker 889 days ago

Azure has a half-million character free tier for their top-quality "Neural" voices, which I find somewhat better than Google Cloud's top tier of voices ("Studio" voices). For personal use you can probably just use Azure for free too!

If you're running a business you'll probably burn through the free tiers of either of them, and Google is wayyy more expensive — roughly $8/hr for Studio voices using the 50k characters per hour estimate. The "Neural2" voices are competitively priced with OpenAI and Azure, but are pretty low quality compared to even Azure (and much worse than OpenAI).

link

stavros 889 days ago

That's a good suggestion, thank you. Would it be possible to post some code? I've found GCP's APIs/documentation to be a bit abstruse.

link

BeetleB 889 days ago

Actually, all the code related to this was copied from the docs, almost verbatim.

(But yes, their docs suck).

link

ametrau 889 days ago

None of their available voices are as good as ms

link

BeetleB 889 days ago

ms?

link

reissbaker 886 days ago

MS = Microsoft (presumably the Azure Neural TTS, which I agree is better than Google's TTS, although worse than OpenAI).

link