Hacker News new | ask | show | jobs
by staunch 1260 days ago
I'm looking forward to some new AI system making it possible to generate high quality audiobooks from epub files.

This "AI book reader" doesn't exist yet, right?

2 comments

Working on this now, cherry-picked audio from such AI systems abound, but submitting arbitrary text for audio synthesis requires a much higher quality bar to be listenable. Planning to have something out early 2023.
What's the best off the shelf solution you've found? I've been trying a few different apps to read my queued articles to me, but all of the voices are awful.
I wrote my own script to send article text through GCP text to speech, which I then put on a private podcast feed. If you go through the voices there’s some decent ones. I also found that speeding up audio in podcast app greatly improves listen ability. I also find picking an AI voice with an accent (for me British or aussi or Indian) helps trick my brain into tolerance.

However I also got tired of manually extracting text and running my script so now I mostly use: https://www.listenlater.fm though on this I just use the “Matthew”. For 5-10 min articles it doesn’t really matter, you get used to the voices so long as you can do something else in the meantime lol.

Honestly I haven't found one that I'd call really usable. That's why I'm building it myself. There are a lot of applications that are fine to listen to for like super short text (a few sentences), but for any long-form prose it gets intolerable IMO. And that's from someone who has listened to lots of intolerable audio from my own system.

EDIT: I'll also add that although there are *many* TTS apps, the true diversity is far less because most of these services are using Amazon Polly, Azure, or GCP's equivalents.

And best open source TTS engine I found so far is Coqui TTS. Demos here:

https://mbarnig.github.io/TTS-Models-Comparison/

I've tried Coqui for generating ebooks and found the edge-cases to be too severe for this to be usable; Coqui has a nasty habit of having strokes, for instance "hello" may become "hellooooooOOoooooooooooOOOOoOOOOOoOOOOOooooooOOOOOOOOOOOoooooooooOOOOOOOooo...." With some manipulation of the input (like adding punctuation) you can get around this, but if you're trying to automate this for a whole book then I think you'll need a lot of trial and error before you get the input massaging heuristics right.

The best TTS system for generating ebooks (that I have used so far) is MacOS's TTS. The process is fairly straight forward; first break the book into sentences using the standard heuristics. Run each of those sentences through TTS to generate an audio file for that sentence, then create a subtitle file with the text of the sentence. Then stitch them all together with ffmpeg, optionally adding a dummy video track if your media player of choice needs one to display subtitles (mpv does not, but some do.) Now you have a subtitled "audiobook."

MacOS's TTS has a robotic quality to it, but it generally works and you can become accustomed to it fairly easily if you give it a chance.

iOS: Voice Dream Reader. Use Ivona “Amy” or “Joey”. Also allows annotations such as comments and highlights and exporting your annotations.
That's the one I've been using mostly, but am fairly displeased with the voices. GP's point about nice samples that fall apart in real use is accurate.
The Google Books mobile app will read any(?) book in your library, and at least they used to support uploading your own books, although I haven't tried it in a while: https://play.google.com/books/uploads?type=ebooks and in the individual book, tap on it, tap the bamboo menu, choose "Read aloud"
Thanks - that is really goodand didn't realise that was something I already had installed.

I'd be happy to pay for a service where I can adjust the readers voice and speed, but this worked amazingly well, and the enhanced version of the audio is really good.