| HN Mirror

glad to hear it!

went through quite a few iterations of aligning text to speech. found that ai transcription was really good most of the time but would hallucinate quite a bit towards the start and end of books. which I think might be related to those models being partially trained on audiobooks, and only having the book text itself, without any of the intro or credits.

in the end I landed on extracting text from ebooks, using rule based and language specific segmentation, and espeak based alignment. pretty basic, but it worked wonders in terms of reliability and accuracy.

if you are looking to generate audio from ebooks this is probably not too helpful. it is something I tried to avoid. something about learning a languages from generated audio didn't sit right with me haha.