I've tried Coqui for generating ebooks and found the edge-cases to be too severe for this to be usable; Coqui has a nasty habit of having strokes, for instance "hello" may become "hellooooooOOoooooooooooOOOOoOOOOOoOOOOOooooooOOOOOOOOOOOoooooooooOOOOOOOooo...." With some manipulation of the input (like adding punctuation) you can get around this, but if you're trying to automate this for a whole book then I think you'll need a lot of trial and error before you get the input massaging heuristics right.
The best TTS system for generating ebooks (that I have used so far) is MacOS's TTS. The process is fairly straight forward; first break the book into sentences using the standard heuristics. Run each of those sentences through TTS to generate an audio file for that sentence, then create a subtitle file with the text of the sentence. Then stitch them all together with ffmpeg, optionally adding a dummy video track if your media player of choice needs one to display subtitles (mpv does not, but some do.) Now you have a subtitled "audiobook."
MacOS's TTS has a robotic quality to it, but it generally works and you can become accustomed to it fairly easily if you give it a chance.
The best TTS system for generating ebooks (that I have used so far) is MacOS's TTS. The process is fairly straight forward; first break the book into sentences using the standard heuristics. Run each of those sentences through TTS to generate an audio file for that sentence, then create a subtitle file with the text of the sentence. Then stitch them all together with ffmpeg, optionally adding a dummy video track if your media player of choice needs one to display subtitles (mpv does not, but some do.) Now you have a subtitled "audiobook."
MacOS's TTS has a robotic quality to it, but it generally works and you can become accustomed to it fairly easily if you give it a chance.