Hacker News new | ask | show | jobs
by fastier 2850 days ago
Where is .djvu?
2 comments

Do you need an option for that? You can convert to PDF and then `pdf2djvu` it.
I believe the best you could do is extract the raw OCR'd text from the document (with some other tool). No formatting or text hierarchy is preserved in the OCR process, only the physical locations and size of the text on the page. From text, you can convert to Markdown or whatever and then manually edit to give the OCR text some structure.