Hacker News new | ask | show | jobs
by darklajid 4376 days ago
I'm confused. What is setting you off?

That seems to be scanned, yes. Then you run a full page OCR engine to extract the text _with bounding rectangles_ and create a PDF with the page being the input image, embedding the OCR results as invisible text at the recognized positions.

It's something I do day in/day out in this line of business, requires no skill apart from a decent engine and a 'create your own PDF' library (iText/iTextSharp is cool, go and buy a license. Not affiliated, but I'm a happy user).