They store every word that MAY be in the scanned document.
So their OCR engine will find a lot of legitimate words, but it will also find a lot of words that don't sense too.
When putting in a term for searching, it looks at the entire index (both legit words and the garbage) and returns you the documents that match.
I think it's quite clever.
Bear in mind that this feature was many years ago, I have no idea if this is still the case.
Screenshot: https://s24953.pcdn.co/blog/wp-content/uploads/2018/02/longh...
Since it's not aimed for transcription (user doesn't know what he's looking for) but for retrieval (user knows what he's looking for), it can get away with mistakes.
References:
https://evernote.com/blog/how-evernotes-image-recognition-wo...
https://help.evernote.com/hc/en-us/articles/208314518-How-Ev...
https://evernote.com/blog/evernote-indexing-system/
They store every word that MAY be in the scanned document.
So their OCR engine will find a lot of legitimate words, but it will also find a lot of words that don't sense too.
When putting in a term for searching, it looks at the entire index (both legit words and the garbage) and returns you the documents that match.
I think it's quite clever.
Bear in mind that this feature was many years ago, I have no idea if this is still the case.