Hacker News new | ask | show | jobs
by siwakotisaurav 1102 days ago
Have you tried tortoisetts? I believe eleven labs basically forked that and made improvements on voice quality and speed there
2 comments

tortoisetts is good to takes a long time to render audio even on it's fastest setting with the fast fork.

Although it wasn't clear to me how voicebox compares.

How does it compare to Voicebox in quality?
I would say that properly configured Tortoise is better, but that comes with the massive caveat that Tortoise:

1 - Is a real pain to get 'working right' - it's not even remotely batteries included

and, more importantly:

2 - Is incredibly slow. I've been turning Heart Of Darkness into an audiobook as a unit test and it takes ~30m per paragraph, on average. Add to that the occasional hiccup where a block gets transcribed badly (Tortoise occasionally 'drops out' of it's selected voice) and Tortoise only really works if you have a ton of compute and you still don't mind waiting forever.

FYI there’s also this fork for faster inference: https://github.com/152334H/tortoise-tts-fast