In Sioux City the Taoiseach parked his coupe by the quay overlooking a fjord. Nearby, a bugle played an octave, children savored sherbet and quinoa with acai, and an artisan sold bagels next to ancient-inspired rouge.
And it got most of the (irregularly spelt) words pronounced correctly except for 'quinoa' and 'acai'.
Just for fun, I also tested some tongue twisters. For some reason, I find it psychologically very difficult to listen to perfectly spoken tongue twisters — almost as if some sort of nail on chalkboard effect is going on!
The sixth sick sheik's sixth sheep's sick
She sells sea-shells by the sea-shore.
The shells she sells are sea-shells, I'm sure.
For if she sells sea-shells by the sea-shore
Then I'm sure she sells sea-shore shells.
OP here. Thanks for the feedback! Yeah, those are some pretty tough sentences (obviously not in the dataset). The tongue twister one is really interesting.
How is this cheaper than Google TTS? Google's standard voices are $4 per million characters. And the deep learning ones are $16 per million characters, same as this offering. Plus Google gives you the first 1M free every month.
OP here. The more you use, the cheaper it gets. But if you're not using much, there isn't any cost-saving benefits.
Once you scale to ~60M characters per month, it's 50% cheaper. In other words, if you're at a stage where you spend $1,000/mo on text-to-speech, you'd spend $500/mo instead.
If you did give a pay as you go, price per character I'd be very interested. I know small-medium scale personal use isn't necessarily your target but I turn a lot of audiobooks into audio. Even on big months it's only 1-1.5 million characters
Or even if I could buy 1 month and then use those credits over multiple successive months I would really start considering it. It's nicer on my ears than even GCP's new neural voices, and I've listened to over 1k hours over the past year or two
Any eye toward allowing users to train their own voices? The only reason I’m using Elevenlabs is because I can train a suite of voices on my (legit, legal) archival content. It’s not a perfect replication of the original voice, usually, but for my purposes, this isn’t a requirement. What it does get is the artifacts, recording environment, and a large swath of the prosony and other voice elements that make it sound real and not AI
This is actually pretty cool. I am hitting a limit with my elevenlabs subscription soon for this app https://news.ycombinator.com/item?id=37696033 Going to replace with yours to see how it goes. Cheers!
The pitch slider works in increments of 0.01 but for some reason we can’t set 1.0, it jumps from 0.99 to 1.01. Same with the speed slider, it goes from -1% to +1%, no 0% (or “Normal” as you call it when the page is refreshed).
Definitely. Even on an M1 it made the page sluggish. I could feel its effect by moving the slider. When I deleted the div.glow-animation with the browser’s DevTools, it became way snappier.
Just for fun, I also tested some tongue twisters. For some reason, I find it psychologically very difficult to listen to perfectly spoken tongue twisters — almost as if some sort of nail on chalkboard effect is going on!