| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by awfullyjohn 4009 days ago

I've tried using Tesseract before, the biggest of the open source libraries. It goes from "okay" to "terrible" depending on the application.

Our particular application was OCRing brick and mortar store receipts directly from emulated printer feeds (imagine printing straight to PDF). We found that Tesseract had too many built-in goodies for image scanning, like warped characters, lighting and shadow defects, and photographic artifacts. When applied directly to presumably 1 to 1 character recognition, it failed miserably.

We found that building our own software to recognize the characters on a 1 to 1 basis produced much better results. See: http://stackoverflow.com/questions/9413216/simple-digit-reco...