Hacker News new | ask | show | jobs
by leobg 1263 days ago
And best open source TTS engine I found so far is Coqui TTS. Demos here:

https://mbarnig.github.io/TTS-Models-Comparison/

1 comments

I've tried Coqui for generating ebooks and found the edge-cases to be too severe for this to be usable; Coqui has a nasty habit of having strokes, for instance "hello" may become "hellooooooOOoooooooooooOOOOoOOOOOoOOOOOooooooOOOOOOOOOOOoooooooooOOOOOOOooo...." With some manipulation of the input (like adding punctuation) you can get around this, but if you're trying to automate this for a whole book then I think you'll need a lot of trial and error before you get the input massaging heuristics right.

The best TTS system for generating ebooks (that I have used so far) is MacOS's TTS. The process is fairly straight forward; first break the book into sentences using the standard heuristics. Run each of those sentences through TTS to generate an audio file for that sentence, then create a subtitle file with the text of the sentence. Then stitch them all together with ffmpeg, optionally adding a dummy video track if your media player of choice needs one to display subtitles (mpv does not, but some do.) Now you have a subtitled "audiobook."

MacOS's TTS has a robotic quality to it, but it generally works and you can become accustomed to it fairly easily if you give it a chance.