| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ricardobeat 313 days ago
	Speech speed is always a tunable parameter and not something intrinsic to the model. The comparison to make is expressiveness and correct intonation for long sentences vs something like espeak. It actually sounds amazing for the size. The closest thing is probably KokoroTTS at 82M params and ~300MB.

1 comments

dvh 313 days ago

I think he meant overacting typical for English dubs.

link

Telemakhos 313 days ago

The voices sound artificial and a bit grating. The male voices especially are lacking, especially in depth: only the ultimate voice has any depth at all, while the others sound like teenagers who haven't finished puberty. None of the voices sound quite human, but they're all very annoying, and part of that is that they sound like they're acting.

link

avisser 313 days ago

I heard a little DVa from Overwatch.

link