| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by oefrha 717 days ago
	Last time I used tesseract (a year ago?) it’s still pretty useless if your text isn’t on a clean background. It doesn’t even come close to Apple’s proprietary on-device OCR.

1 comments

ranger_danger 717 days ago

There is a whole page on their site dedicated to methods for improving the accuracy: https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html

I think most frontends to tesseract employ a lot of these methods and maybe more... but trying to use tesseract directly can indeed be difficult without extra processing of the image first.

link

oefrha 717 days ago

I know, I tried many things with the photo collection I was working with, including advice from that very page, generally to relatively poor results. (I ended using Apple’s framework on macOS.) The point is tesseract is definitely not “smarter” in any way, at best it’s on par with Apple’s OCR when you hand it very clean text.

link