Hacker News new | ask | show | jobs
by lupusreal 753 days ago
I've been using Piper for this. The quality is (in my subjective opinion) as good as the TTS built into MacOS is, it's open source, and it's so fast that you can run it in real time on a raspberry pi. On a real computer I can generate a whole audiobook in about 20 minutes.

What I do is I split the book up into sentences, generate speech for each sentence and at the same time turn that sentence into subtitles. Then I combine the two and stitch them all together into a mp4 container with audio and a subtitle track using ffmpeg. mpv (and think VLC) can display subtitles synced to audio playback even when there is no video track.

1 comments

Thats genius! Was it a lot of work to set up?
Super cool! A lot of what you are describing I want to do in the future too.

The issue I personally found with traditional TTS is the lack of emotional range and lack of thoughtful pauses. ML models are better at this and picking up on small queues that are hard to program into a TTS otherwise.

I love the iPhone on Safari has a built-in TTS now and was excited to use it. It actually didn't work on Make by Pieter levels after I bought it. So I went to explorer other options. After I started listening to AI generated TTS, I just couldn't go back. It's like 270p vs 2160p (4K).