Hacker News new | ask | show | jobs
by nielsinho 1046 days ago
TorToiSe (https://github.com/neonbjb/tortoise-tts) produces the best quality speech of any freely available model. However, its long inference times makes it impractical for voice chatbots like Gdansk.
1 comments

What's the reason for the high inference latency? Any ideas on how this could be improved?
TorToiSe is composed of many large models: GPT-2 for text encodings, as well as a large VQVAE encoder + large diffusion model decoder.

Only the big spaghetti inference code (+ weights) has been published, so there's a high entrance barrier for re-training / improving it.

It has been sped up, but still not fast enough for this use case. https://github.com/manmay-nakhashi/tortoise-tts-fastest