Hacker News new | ask | show | jobs
by tastyminerals 2292 days ago
For accuracy and speed. The market SOTA Abbyy is far from being accurate.
1 comments

> The market SOTA Abbyy is far from being accurate.

While Abbyy is likely the best, it's also incredibly expensive. Roughly on the order of $0.01/page or maybe at best a tenth of that in high volume.

For comparison, I run a bunch of OCR servers using the open source tesseract library. The machine-time on one of the major cloud providers works out to roughly $0.01 for 100-1000 pages.

OCR.space charges only $10 for 100,000 conversions. The quality is good, but not as good as Abbyy.
It is the best and this is one of the reasons why PDF extraction is hard :)