|
|
|
|
|
by lhl
996 days ago
|
|
For neutral sounding very fast/efficient voices, I find Coqui TTS VITS models to be very good. For slower, more expressive voice or voice cloning I think the Coqui TTS XTTS is good (or you can look at the mrq/tortoise-tts). I'm still awaiting a StyleTTS2 implementation. The audio samples sound top notch: https://styletts2.github.io/ |
|
Looks promising, I'm going to check it out too! MIT license, even! If it's fast enough for real time, it could be the new best option. The paper claims faster inference than VITS...