Hacker News new | ask | show | jobs
by iLemming 483 days ago
What's the fastest and accurate CLI OCR tool? My use case is simple - I want to be able to grab a piece of screen (Flameshot is great for that), and OCR it. I need this for note-taking during pair-programming over Zoom.

Currently I'm using tesseract - it works, it's fast, but it also makes mistakes; it would be also great if it could discern tabular data and put them in ascii or markdown tables. I've tried docling, but it feels like a bit of an overkill. It seems to be slower - remember, I need to be able to grab the text from the screenshot very quickly. I have only tried default settings, maybe tweaking it would improve things.

Can anyone share some thoughts on this? Thanks!

2 comments

Anything using the Apple Vision framework is fast and surprisingly accurate:

https://github.com/bytefer/macos-vision-ocr

Cool to see, may use this locally for OCR in some cases. But I think the "handwriting" example is a little misleading. Thats a font, not a scan of hand written material
This uses the old APIs that are less accurate than the new Swift-only LiveText ones
The AI OCR build into snipping tool in windows is better than tesseract, albeit more inconvenient than something like powertoys or Capture2Text, which use a quick shortcut.