Hacker News new | ask | show | jobs
by Nouser76 851 days ago
I've used coqui.ai's TTS models[0] and library[1] to great success. I was able to get cloned voice to be rendered in about 80% of the audio clip length, and I believe you can also stream the response. Do note the model license for XTTS, it is one they wrote themselves that has some restrictions.

[0] https://huggingface.co/coqui/XTTS-v2

[1] https://github.com/coqui-ai/TTS