| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by IceHegel 1302 days ago

In 2019 I was working on a project that involved OCRing millions of scanned historical documents. I evaluated Google, Azure, Amazon, Adobe, ABBYY, and Tesseract somewhat rigorously.

Google's was by far the best, especially for obscured or malformed characters. Azure was second and I ended up merging the results from both.

For my use case (in Spring 2019) Tesseract was not very accurate and struggled with slanted text especially. Hopefully that has changed.