|
|
|
|
|
by sandreas
2281 days ago
|
|
Thank you HN for this article and the comments. Years ago I wrote a little java command line tool based on BoofCV (Pure Java CV lib - https://boofcv.org/), tess4j (tesseract) and PDFBox to create PDFs with OCR and invisible Text Layer, to make its contents searchable like the OCR Option in PDF X Change Viewer. I used a combination of Thresholding and deskewing to improve my Documents (e.g. Sauvola, Nick) - see https://boofcv.org/index.php?title=Example_Thresholding Now I plan to restore and improve the old code and provide it as open source solution :-) Hope the code still somewhere on my old harddisks. |
|