| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nshm 857 days ago
	Metavoice is one of a dozen GPT-based TTS systems around starting from Tortoise. And not that great honestly. You can clearly hear "glass scratches" in their sound, it is because they trained on MP3-compressed data. There are much more clear sounding systems around. You can listen for StyleTTS2 to compare.

4 comments

standardly 856 days ago

Is the crispness of compressed audio really the benchmark of TTS improvements? I feel like that's an aside. A valid point, but not much of a detractor..

link

nshm 856 days ago

Yes, it is one of the important aspects. In particular if you use TTS to create an audiobook or in a video production.

link

lozf 856 days ago

Especially as any finished product may end up being compressed again. Lossy to lossy audio transcodes ALWAYS cause additional audio data to be lost.

link

qwertox 857 days ago

I had forgotten about StyleTTS2, and it was discussed here on HN a couple of months ago. Maybe that's what made me feel that there's something going on.

link

popalchemist 857 days ago

I've tested both. StyleTTS2 is impressive, especially its speed, but the prosody is lacking, compared to Metavoice.

link

ionwake 857 days ago

Is it possible to run Metavoice and other pytorch systems on Apple silicon EG the M1? I keep getting issues.

link