Hacker News new | ask | show | jobs
by RandomBookmarks 3147 days ago
The ability to create searchable PDFs is very useful and convenient. But creating searchable PDFs does not require a deep understanding of the document format (like column detection etc). You just place the words at the right coordinates of their bounding boxes. You can test this for example here: https://ocr.space - select the option to create a searchable PDF. It works even for the most complex documents.

Now, creating a Word document from a scan is a different beast because it requires layout analysis. This is where Abbyy with its long experience still has a good lead.