| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by asveikau 66 days ago
	What OCR do you guys use? I have only seen OCR that makes a lot of errors. Having it be usable requires tons of manual review. I probably wouldn't trust an LLM to do that review because it may introduce its own errors. Edit: downvoters, would you like to answer my question? I would genuinely like to know. I thought based on the confidence of the comment above there must be a super accurate OCR I've never heard of, but after seeing the sibling comment I'm going to guess there isn't.

2 comments

zshn25 65 days ago

Stirling PDF https://github.com/Stirling-Tools/Stirling-PDF is a free self-hosted PDF tool that can do very accurate OCR while keeping the formatting.

link

UltraSane 66 days ago

Modern OCR is VERY accurate. Heck Adobe Acrobat Pro OCR was essentially perfect 20 years ago.

link

wl 66 days ago

One of my hobbies is typesetting modern editions of a certain type of rare, obscure old books that were poorly typeset to begin with. Modern OCR—and I’ve tried plenty of tools—is still rather error prone in my application.

link

asveikau 66 days ago

Can you name a good open source one? I have spent many hours in the current decade correcting OCR errors. Mostly tesseract.

link