Hacker News new | ask | show | jobs
by clinta 2077 days ago
OCR is only an issue if it is an image PDF that came from a scanner.

I'm in the staffing industry and deal with automatic resume parsing tools. They have no problem with text PDFs that are saved from the source.

1 comments

I’m the founder of a startup that has eliminated the issue you mention. Our documents-to-database service handles arbitrary rotation, skews, and offsets.

Example of it handling a scan of a document that’s rotated ~100 degrees and physically cut in half with scissors here: https://siftrics.com/hydra.html