| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ElectricalUnion 1149 days ago

Tesseract might be "not very good" but it is still state-of-the-art, often available, with many languages supported.

The special sauce - what you need to get a better result - is good, adaptive thresholding (something more advanced that raw naive binary thresholding you get feeding naive color/grayscale images to OCR).

As far as I know, once you get that nailed it doesn't matter that much what OCR you use - as long as it's available and supports your target language.