Hacker News new | ask | show | jobs
Ask HN: Is there no good OCR available?
2 points by leokster 671 days ago
I'm wondering if tools like Tesseract are still the open-source (and offline) gold standard. There are, in the meantime, document intelligence services from all large cloud providers, but there is still not really a usable AI model that is capable of doing good OCR (image, not necessarily scans -> text). Do you know any active projects or resources in that field?
2 comments

Apple’s operating systems have been doing stellar OCR since 2019. When the feature was announced I was uninterested, but now I’m surprised how much I use it. It works without any extra work in Preview, Safari, and other apps. You can call it programatically via Shortcuts or the Vision APIs.

https://developer.apple.com/documentation/vision/recognizing...

(Edit: Nevermind, sorry. I misread your question. I think you're mainly interested in free offline apps.)

Does it have to be an "AI" model in the modern usage of it (LLMs, etc.?)

In the past, I found Google's Cloud Vision API to be pretty good for this sort of thing (images in text): https://cloud.google.com/vision?hl=en#demo

AFAIK Tesseract was never state of the art, it was just free and cheap. The commercial offerings (in my limited experience) were usually much more accurate.

Second Google's offering which can reasonably read my chicken scratch