Hacker News new | ask | show | jobs
by ranger_danger 670 days ago
There is a whole page on their site dedicated to methods for improving the accuracy: https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html

I think most frontends to tesseract employ a lot of these methods and maybe more... but trying to use tesseract directly can indeed be difficult without extra processing of the image first.

1 comments

I know, I tried many things with the photo collection I was working with, including advice from that very page, generally to relatively poor results. (I ended using Apple’s framework on macOS.) The point is tesseract is definitely not “smarter” in any way, at best it’s on par with Apple’s OCR when you hand it very clean text.