Hacker News new | ask | show | jobs
by bdowling 746 days ago
Many of the free or cheap OCR services are based on the free, open-source Tesseract OCR.

https://github.com/tesseract-ocr/tesseract/

Those services usually do not expose all of the options. If you’re handy with shell scripts or Python, you can probably get better performance by hand-tuning options for your particular images. For example, if I recall there are page segmentation options to tell Tesseract to expect multi-column text. That alone might get you better performance than the automatic mode.