| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by randkyp 854 days ago
	I played with it some more and I have to agree. For actual voice _cloning_, XTTS2 sounds much, much closer to the original speaker. But the resulting output is also much more unpredictable and sometimes downright glitchy compared to OpenVoice. XTTS2 also tries to "act out" the implied emotion/tone/pitch/cadence in the input text, for better or worse. But my use case is just to have a nice-sounding local TTS engine, and current text-to-phoneme conversion quirks aside, OpenVoice seems promising. It's fast, too.

1 comments

And StyleTTS2 generalizes out of domain even better than that.