| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nielsinho 1092 days ago
	TorToiSe (https://github.com/neonbjb/tortoise-tts) produces the best quality speech of any freely available model. However, its long inference times makes it impractical for voice chatbots like Gdansk.

1 comments

What's the reason for the high inference latency? Any ideas on how this could be improved?

TorToiSe is composed of many large models: GPT-2 for text encodings, as well as a large VQVAE encoder + large diffusion model decoder.

Only the big spaghetti inference code (+ weights) has been published, so there's a high entrance barrier for re-training / improving it.

It has been sped up, but still not fast enough for this use case. https://github.com/manmay-nakhashi/tortoise-tts-fastest