Hacker News new | ask | show | jobs
by flakiness 650 days ago
How do you get the source data (text) from a book? To me it is the major roadblock for LLM-based commercial content consumption.
1 comments

Old books are on Gutenberg, archive.org etc.

Physical ones, I scan. Cutting the spine is easiest. But today you can also just take pics with your phone.

Many retailers also sell EPUB. Which is just HTML.

Obviously, that’s all for private consumption only. (Unless you’re OpenAI I guess. :-P)

Oh you gotta serious! Salute to you from a lazy dad.