Y
Hacker News
new
|
ask
|
show
|
jobs
by
flicken
2221 days ago
Although
https://www.willus.com/k2pdfopt/
is meant for reformatting PDFs to view on e-readers, it does do a reasonable job of extracting text via OCR and storing as a PDF layer. The underlying engine can be either
https://github.com/tesseract-ocr/tesseract
or
http://jocr.sourceforge.net/