Hacker News new | ask | show | jobs
by sandreas 2281 days ago
Thank you HN for this article and the comments.

Years ago I wrote a little java command line tool based on BoofCV (Pure Java CV lib - https://boofcv.org/), tess4j (tesseract) and PDFBox to create PDFs with OCR and invisible Text Layer, to make its contents searchable like the OCR Option in PDF X Change Viewer.

I used a combination of Thresholding and deskewing to improve my Documents (e.g. Sauvola, Nick) - see https://boofcv.org/index.php?title=Example_Thresholding

Now I plan to restore and improve the old code and provide it as open source solution :-) Hope the code still somewhere on my old harddisks.