Why would you pirate a TTS service when there are so many great options for local open source TTS now? Models like Fish and Kokoro and StyleTTSv2 are great and very fast.
Do you know any commercial licensed TTS that support 50+ languages and are relatively small (e.g. many small models, not 1 big model)? Meta's open models supports like 300 languages, but the license doesn't permit commercial use :-/
I have been experimenting with piper TTS recently, it's free, open source, fast and has a lot of voices in different languages but the quality is not the best but it's still good enough for most cases.
For my native language, Norwegian, Piper TTS is at best "usable", and sometimes a fair bit worse than that. At least in its default form[1].
Especially the rhythm and timing is often very jarring making words difficult to understand, especially when the pitch is not quite right.
It also doesn't seem to know about pacing, ignoring semicolon and comma.
Combined I often need to think hard about what it just said, or even listen to it again.
I also notice these issues in the various English voice models to varying degrees, so seems to be an inherent problem. Or can it be improved significantly with training it yourself?
I saw that the piper-phonemize project linked to espeak-ng, and so I tried to pass the Piper sample text through espeak-ng and the way it phonemicized the text had the same rhythm issues that I noted in the TTS sample. Ie it put the stresses in the same wrong places in certain words and such.
This was also reflected in the voice output of espeak-ng, even though it's overall quality was vastly subpar compared to Piper TTS (as expected).
So it seems that improving this aspect might be one way to get better performance out of Piper for my language. Not sure how easy that'll be tho...
Piper is superb for my needs. Runs extremely fast on CPU (so fast it can run in real time on a raspi) so it's perfect for use on laptops without dedicated GPUs. Subjectively, I'd say the quality is about on par with where MacOS's TTS was about 10 years ago, which is extremely usable.
The API endpoint was clearly intended for use only by Edge. Yes, reverse engineering the authentication (even if trivial) and using it for other applications, knowing that was not its intended use, I consider a form of piracy.
That is a very hazardous slope to go down. We are already seeing user-agent discrimination and this is no different than using Bing from a browser that isn't Edge.
I believe the Edge API supports more models:
https://gist.github.com/BettyJJ/17cbaa1de96235a7f5773b8690a2...
Do you know any commercial licensed TTS that support 50+ languages and are relatively small (e.g. many small models, not 1 big model)? Meta's open models supports like 300 languages, but the license doesn't permit commercial use :-/