Hacker News new | ask | show | jobs
by cratermoon 480 days ago
Agree wholeheartedly. Modern OCR is astonishingly good, more importantly it's deterministically so. It's failure modes, when it's unable to read the text, are recognizably failures.

Results for VLM accuracy & precision are not good. https://arxiv.org/html/2406.04470v1#S4

1 comments

which solutions would you classify as "modern OCR"

are we talking tesseract or something?

Probably something like Apple Vision Framework or Amazon Textract or Google's Cloud Vision.

Tesseract does well under ideal conditions, but the world is messy.

I was thinking ABBYY FineReader, but those, too. Instead of using VLMs or any sort of generative AI, they're build on good old-fashioned feature extraction and nearest neighbor classifiers such as the k-nearest neighbors algorithm. It's possible to build a working prototype of this technique using basic ML algorithms.