Hacker News new | ask | show | jobs
by sandreas 2216 days ago
My approach (in java) was using a set of filters to clean up the image with BoofCV, then using tess4j OCR to make the document searchable and then use Apache PDFBox to create a PDF with invisible text layer. Its not open source yet (i plan to do so), but you could take a look at https://github.com/ctodobom/OpenNoteScanner - which seems to be much more advanced.