|
|
|
|
|
by MrDrMcCoy
709 days ago
|
|
Most of the books are bloated PDFs. I'm slowly working on a project to reliably convert PDF to DjVu, which on average yields a highly readable document that's 33% of the original size on disk. The project is proving difficult, as the tooling for DjVu is quite moldy now, and often needs to be manually reviewed to ensure the file remains readable. Pdf2djvu exists, but it's highly unreliable, and thus can't be used in bulk. Other ebook formats are XML-based and tend to be similarly bloated due to the overhead of the markup. It's a hard problem with so little in the way of good file format choices. |
|
Ultimately that content is going to need to be represented as raw UTF-8 text and encoded images, so I don't see much upside to migrating it from one intermediate lossy file format to another.