Hacker News new | ask | show | jobs
by nshm 857 days ago
Metavoice is one of a dozen GPT-based TTS systems around starting from Tortoise. And not that great honestly. You can clearly hear "glass scratches" in their sound, it is because they trained on MP3-compressed data.

There are much more clear sounding systems around. You can listen for StyleTTS2 to compare.

4 comments

Is the crispness of compressed audio really the benchmark of TTS improvements? I feel like that's an aside. A valid point, but not much of a detractor..
Yes, it is one of the important aspects. In particular if you use TTS to create an audiobook or in a video production.
Especially as any finished product may end up being compressed again. Lossy to lossy audio transcodes ALWAYS cause additional audio data to be lost.
I had forgotten about StyleTTS2, and it was discussed here on HN a couple of months ago. Maybe that's what made me feel that there's something going on.
I've tested both. StyleTTS2 is impressive, especially its speed, but the prosody is lacking, compared to Metavoice.
Is it possible to run Metavoice and other pytorch systems on Apple silicon EG the M1? I keep getting issues.