Hacker News new | ask | show | jobs
by tmbsundar 2211 days ago
>> - A tool to improve scanned document images and create a pdf

Would you mind sharing the approach you used or the product page if it is public?

Incidentally, I was trying today to get two images combined into a single image and convert that into a single PDF. Tried with paint.net, MS Paint 3D etc., It was messy and the resultant PDF was also huge. Finally, gave up and manually pasted the images into a word doc and exported them as a single PDF.

2 comments

My approach (in java) was using a set of filters to clean up the image with BoofCV, then using tess4j OCR to make the document searchable and then use Apache PDFBox to create a PDF with invisible text layer. Its not open source yet (i plan to do so), but you could take a look at https://github.com/ctodobom/OpenNoteScanner - which seems to be much more advanced.
Look into ImageMagick and pdftk. That will solve your problems.