Hacker News new | ask | show | jobs
by efields 2757 days ago
Is off the shelf open source OCR not reliable for an image of reasonable fidelity, like a smartphone camera picture of a B&W text document?

I ask because it feels like I should have an app that lets me scan with my phone, process the text with OCR, then let me plain text search every scanned document I have.

The first part only natively made it into iOS Notes a year or two ago, but that whole experience above should be out of the box, IMHO…

3 comments

There's a difference between doing OCR and actually understanding what is what in the document content.

For normal text OCR works well. But automatically understanding what is what is more complex.

This ^^

And actually understanding the context of what you're trying to use OCR on can work backward to determine what the text actually is, i.e. if it's a "Name" field then the probabilities of ambiguous letters may change (in the case of handwriting rec).

No open source ocr doesn't work that great, i work for a telecom company, and we process over millions of documents a month, we built everything in house and now are able to process it at almost 40cents per 1000 documents. It a long process to process huge documents like payslips which require text boundary detection, word identification, spatial clustering and writing parsers (depends on word, segment, and clustering probabilities) which can extract required fields out of the documents.
This is an Evernote feature. Dropbox also launched this feature.
Evernote is an interesting case.

They store every word that MAY be in the scanned document.

So their OCR engine will find a lot of legitimate words, but it will also find a lot of words that don't sense too.

When putting in a term for searching, it looks at the entire index (both legit words and the garbage) and returns you the documents that match.

I think it's quite clever.

Bear in mind that this feature was many years ago, I have no idea if this is still the case.

Yeah, Evernote's OCR engine will generate possible candidates for every given word and will sort them internally by confidence score.

Screenshot: https://s24953.pcdn.co/blog/wp-content/uploads/2018/02/longh...

Since it's not aimed for transcription (user doesn't know what he's looking for) but for retrieval (user knows what he's looking for), it can get away with mistakes.

References:

https://evernote.com/blog/how-evernotes-image-recognition-wo...

https://help.evernote.com/hc/en-us/articles/208314518-How-Ev...

https://evernote.com/blog/evernote-indexing-system/

Yep it's quite clever for searching for things, much less useful for doing something based on the recognized text.
OneNote can do transcription (copy text from image).