Hacker News new | ask | show | jobs
by oefrha 670 days ago
Last time I used tesseract (a year ago?) it’s still pretty useless if your text isn’t on a clean background. It doesn’t even come close to Apple’s proprietary on-device OCR.
1 comments

There is a whole page on their site dedicated to methods for improving the accuracy: https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html

I think most frontends to tesseract employ a lot of these methods and maybe more... but trying to use tesseract directly can indeed be difficult without extra processing of the image first.

I know, I tried many things with the photo collection I was working with, including advice from that very page, generally to relatively poor results. (I ended using Apple’s framework on macOS.) The point is tesseract is definitely not “smarter” in any way, at best it’s on par with Apple’s OCR when you hand it very clean text.