Hacker News new | ask | show | jobs
Show HN: Super fast/cheap text-to-speech API (unrealspeech.com)
43 points by jazz3020 990 days ago
9 comments

Very impressive. I tested it with:

  In Sioux City the Taoiseach parked his coupe by the quay overlooking a fjord. Nearby, a bugle played an octave, children savored sherbet and quinoa with acai, and an artisan sold bagels next to ancient-inspired rouge.
And it got most of the (irregularly spelt) words pronounced correctly except for 'quinoa' and 'acai'.

Just for fun, I also tested some tongue twisters. For some reason, I find it psychologically very difficult to listen to perfectly spoken tongue twisters — almost as if some sort of nail on chalkboard effect is going on!

  The sixth sick sheik's sixth sheep's sick


  She sells sea-shells by the sea-shore.
  The shells she sells are sea-shells, I'm sure.
  For if she sells sea-shells by the sea-shore
  Then I'm sure she sells sea-shore shells.
Honestly, I second guess myself any time I have to say “quinoa” or “açaí” out loud.
OP here. Thanks for the feedback! Yeah, those are some pretty tough sentences (obviously not in the dataset). The tongue twister one is really interesting.
How is this cheaper than Google TTS? Google's standard voices are $4 per million characters. And the deep learning ones are $16 per million characters, same as this offering. Plus Google gives you the first 1M free every month.

https://cloud.google.com/text-to-speech/pricing

OP here. The more you use, the cheaper it gets. But if you're not using much, there isn't any cost-saving benefits.

Once you scale to ~60M characters per month, it's 50% cheaper. In other words, if you're at a stage where you spend $1,000/mo on text-to-speech, you'd spend $500/mo instead.

Do you support SSML and other languages besides English? The demo site suggests not, and I couldn’t explore further without signing up.

This service might be cheaper than Google at scale, but if I needed 60M chars a month I’d probably care about those features.

just checked and the free 1M is also available on this website
Only one time.
If you did give a pay as you go, price per character I'd be very interested. I know small-medium scale personal use isn't necessarily your target but I turn a lot of audiobooks into audio. Even on big months it's only 1-1.5 million characters

Or even if I could buy 1 month and then use those credits over multiple successive months I would really start considering it. It's nicer on my ears than even GCP's new neural voices, and I've listened to over 1k hours over the past year or two

Also 1M one time does go very fast compared to GCP's 1M per month
Any eye toward allowing users to train their own voices? The only reason I’m using Elevenlabs is because I can train a suite of voices on my (legit, legal) archival content. It’s not a perfect replication of the original voice, usually, but for my purposes, this isn’t a requirement. What it does get is the artifacts, recording environment, and a large swath of the prosony and other voice elements that make it sound real and not AI
This is actually pretty cool. I am hitting a limit with my elevenlabs subscription soon for this app https://news.ycombinator.com/item?id=37696033 Going to replace with yours to see how it goes. Cheers!
The pitch slider works in increments of 0.01 but for some reason we can’t set 1.0, it jumps from 0.99 to 1.01. Same with the speed slider, it goes from -1% to +1%, no 0% (or “Normal” as you call it when the page is refreshed).
Oh, interesting. What browser are you using? I'm able to set to 0.
Safari on macOS.
Anyone knows a good open source alternative what natural sounding voices ?
English-language only text-to-speech.
My computers fans ramped up visiting your site.
Ah, that must be because of the render in the background. Maybe we should turn that off.
> Maybe we should turn that off.

Definitely. Even on an M1 it made the page sluggish. I could feel its effect by moving the slider. When I deleted the div.glow-animation with the browser’s DevTools, it became way snappier.

Granted my GPU is from 2011. But we're still out here.