| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by modeless 513 days ago
	Why would you pirate a TTS service when there are so many great options for local open source TTS now? Models like Fish and Kokoro and StyleTTSv2 are great and very fast. Click the leaderboard tab here: https://huggingface.co/spaces/TTS-AGI/TTS-Arena

5 comments

itake 513 days ago

The models you shared only support the top ~10 languages / english only.

I believe the Edge API supports more models:

https://gist.github.com/BettyJJ/17cbaa1de96235a7f5773b8690a2...

Do you know any commercial licensed TTS that support 50+ languages and are relatively small (e.g. many small models, not 1 big model)? Meta's open models supports like 300 languages, but the license doesn't permit commercial use :-/

link

archerx 513 days ago

I have been experimenting with piper TTS recently, it's free, open source, fast and has a lot of voices in different languages but the quality is not the best but it's still good enough for most cases.

https://rhasspy.github.io/piper-samples/

link

magicalhippo 513 days ago

For my native language, Norwegian, Piper TTS is at best "usable", and sometimes a fair bit worse than that. At least in its default form[1].

Especially the rhythm and timing is often very jarring making words difficult to understand, especially when the pitch is not quite right.

It also doesn't seem to know about pacing, ignoring semicolon and comma.

Combined I often need to think hard about what it just said, or even listen to it again.

I also notice these issues in the various English voice models to varying degrees, so seems to be an inherent problem. Or can it be improved significantly with training it yourself?

[1]: https://rhasspy.github.io/piper-samples/

link

archerx 513 days ago

I don’t know about Norwegian but I wonder if the issues are due to the training data.

I’m sure it’s possible to train new voices.

The English voices are hit or miss, but some voices have up to 900 speakers so it should be able to find a nice voice in the hay stack.

The thing I like about piper is it is so fast. I set it up to stream the output to VLC and it starts speaking in less than a second even on my laptop.

I wish it could have eleven labs quality but right now the speed is the most important factor for what I’m doing with it.

link

magicalhippo 513 days ago

I saw that the piper-phonemize project linked to espeak-ng, and so I tried to pass the Piper sample text through espeak-ng and the way it phonemicized the text had the same rhythm issues that I noted in the TTS sample. Ie it put the stresses in the same wrong places in certain words and such.

This was also reflected in the voice output of espeak-ng, even though it's overall quality was vastly subpar compared to Piper TTS (as expected).

So it seems that improving this aspect might be one way to get better performance out of Piper for my language. Not sure how easy that'll be tho...

link

rolfus 513 days ago

What TTS model has given the best results for you (for Norwegian)? I've tried MS Azure and it's pretty good, but not flawless.

link

magicalhippo 513 days ago

I haven't found any open source that come close to the commercial offerings, though I admin I haven't tried 'em all.

Azure like you say is pretty decent, Google does an ok enough job but not as good.

link

lupusreal 512 days ago

Piper is superb for my needs. Runs extremely fast on CPU (so fast it can run in real time on a raspi) so it's perfect for use on laptops without dedicated GPUs. Subjectively, I'd say the quality is about on par with where MacOS's TTS was about 10 years ago, which is extremely usable.

link

deadprogram 513 days ago

I also have used Piper and agree it is worth trying out.

link

willwade 513 days ago

https://ttsvoicesavailable.streamlit.app

Acapela, Nuance - but its around 75 languages.

link

itake 512 days ago

I really want southeast Asian languages (thai, laos, etc). seems only MS supports those.

link

depr 512 days ago

Isn't that Nuance product EOL?

link

modeless 513 days ago

I don't know, but the Edge API is not licensed for any use, commercial or otherwise (outside of Edge itself).

link

userbinator 513 days ago

"pirate"? This was always free.

link

modeless 513 days ago

The API endpoint was clearly intended for use only by Edge. Yes, reverse engineering the authentication (even if trivial) and using it for other applications, knowing that was not its intended use, I consider a form of piracy.

link

itake 513 days ago

I'm not really sure how this is any different from a web crawler? I guess the issue would be republishing the content is bad.

But I thought the LinkedIn lawsuit settled that crawlers are ok, as long as you're not republishing the content?

link

userbinator 513 days ago

That is a very hazardous slope to go down. We are already seeing user-agent discrimination and this is no different than using Bing from a browser that isn't Edge.

link

TOMDM 513 days ago

If Bing wasn't a public website and only accessable through the windows Search bar/Edge without reverse engineering the API I'd agree with you.

Comparing an API that typically requires a key and a public website is absurd.

link

userbinator 513 days ago

It's still publicly accessible.

link

noja 513 days ago

Typing anything with “r” into that text to speech box gives a random sentence instead

link

natebc 513 days ago

Is Kokoro open source? I couldn't find it's source anywhere.

link

homarp 513 days ago

or directly try https://kokorotts.com/ or https://huggingface.co/spaces/hexgrad/Kokoro-TTS

link