Show HN: Super fast/cheap text-to-speech API

Y	Hacker News new \| ask \| show \| jobs

	Show HN: Super fast/cheap text-to-speech API (unrealspeech.com)
	43 points by jazz3020 1037 days ago

9 comments

aragonite 1037 days ago

Very impressive. I tested it with:

  In Sioux City the Taoiseach parked his coupe by the quay overlooking a fjord. Nearby, a bugle played an octave, children savored sherbet and quinoa with acai, and an artisan sold bagels next to ancient-inspired rouge.

And it got most of the (irregularly spelt) words pronounced correctly except for 'quinoa' and 'acai'.

Just for fun, I also tested some tongue twisters. For some reason, I find it psychologically very difficult to listen to perfectly spoken tongue twisters — almost as if some sort of nail on chalkboard effect is going on!

  The sixth sick sheik's sixth sheep's sick


  She sells sea-shells by the sea-shore.
  The shells she sells are sea-shells, I'm sure.
  For if she sells sea-shells by the sea-shore
  Then I'm sure she sells sea-shore shells.

link

ericrallen 1037 days ago

Honestly, I second guess myself any time I have to say “quinoa” or “açaí” out loud.

link

jazz3020 1036 days ago

OP here. Thanks for the feedback! Yeah, those are some pretty tough sentences (obviously not in the dataset). The tongue twister one is really interesting.

link

doomrobo 1037 days ago

How is this cheaper than Google TTS? Google's standard voices are $4 per million characters. And the deep learning ones are $16 per million characters, same as this offering. Plus Google gives you the first 1M free every month.

https://cloud.google.com/text-to-speech/pricing

link

jazz3020 1036 days ago

OP here. The more you use, the cheaper it gets. But if you're not using much, there isn't any cost-saving benefits.

Once you scale to ~60M characters per month, it's 50% cheaper. In other words, if you're at a stage where you spend $1,000/mo on text-to-speech, you'd spend $500/mo instead.

link

crackedbassoon 1036 days ago

Do you support SSML and other languages besides English? The demo site suggests not, and I couldn’t explore further without signing up.

This service might be cheaper than Google at scale, but if I needed 60M chars a month I’d probably care about those features.

link

str3wer 1037 days ago

just checked and the free 1M is also available on this website

link

crackedbassoon 1037 days ago

Only one time.

link

JZL003 1036 days ago

If you did give a pay as you go, price per character I'd be very interested. I know small-medium scale personal use isn't necessarily your target but I turn a lot of audiobooks into audio. Even on big months it's only 1-1.5 million characters

Or even if I could buy 1 month and then use those credits over multiple successive months I would really start considering it. It's nicer on my ears than even GCP's new neural voices, and I've listened to over 1k hours over the past year or two

link

JZL003 1036 days ago

Also 1M one time does go very fast compared to GCP's 1M per month

link

diggum 1036 days ago

Any eye toward allowing users to train their own voices? The only reason I’m using Elevenlabs is because I can train a suite of voices on my (legit, legal) archival content. It’s not a perfect replication of the original voice, usually, but for my purposes, this isn’t a requirement. What it does get is the artifacts, recording environment, and a large swath of the prosony and other voice elements that make it sound real and not AI

link

tinytera 1036 days ago

This is actually pretty cool. I am hitting a limit with my elevenlabs subscription soon for this app https://news.ycombinator.com/item?id=37696033 Going to replace with yours to see how it goes. Cheers!

link

latexr 1037 days ago

The pitch slider works in increments of 0.01 but for some reason we can’t set 1.0, it jumps from 0.99 to 1.01. Same with the speed slider, it goes from -1% to +1%, no 0% (or “Normal” as you call it when the page is refreshed).

link

jazz3020 1036 days ago

Oh, interesting. What browser are you using? I'm able to set to 0.

link

latexr 1036 days ago

Safari on macOS.

link

anoy8888 1036 days ago

Anyone knows a good open source alternative what natural sounding voices ?

link

svth 1036 days ago

English-language only text-to-speech.

link

pcdoodle 1037 days ago

My computers fans ramped up visiting your site.

link

jazz3020 1036 days ago

Ah, that must be because of the render in the background. Maybe we should turn that off.

link

latexr 1036 days ago

> Maybe we should turn that off.

Definitely. Even on an M1 it made the page sluggish. I could feel its effect by moving the slider. When I deleted the div.glow-animation with the browser’s DevTools, it became way snappier.

link

pcdoodle 1036 days ago

Granted my GPU is from 2011. But we're still out here.

link