| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bdowling 746 days ago

Many of the free or cheap OCR services are based on the free, open-source Tesseract OCR.

https://github.com/tesseract-ocr/tesseract/

Those services usually do not expose all of the options. If you’re handy with shell scripts or Python, you can probably get better performance by hand-tuning options for your particular images. For example, if I recall there are page segmentation options to tell Tesseract to expect multi-column text. That alone might get you better performance than the automatic mode.