| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ritvikpandey21 181 days ago
	we disagree! we've found llms by themselves aren't enough and suffer from pretty big failure modes like hallucination and inferring text rather than pure transcription. we wrote a blog about this [1]. the right approach so far seems to be a hybrid workflow that uses very specific parts of the language model architecture. [1] https://www.runpulse.com/blog/why-llms-suck-at-ocr

3 comments

mritchie712 181 days ago

> Why LLMs Suck at OCR

I paste screenshots into claude code everyday and it's incredible. As in, I can't believe how good it is. I send a screenshot of console logs, a UI and some HTML elements and it just "gets it".

So saying they "Suck" makes me not take your opinion seriously.

link

ritvikpandey21 181 days ago

yeah models are definitely improving, but we've found even the latest ones still hallucinate and infer text rather than doing pure transcription. we carry out very rigorous benchmarks against all of the frontier models. we think the differentiation is in accuracy on truly messy docs (nested tables, degraded scans, handwriting) and being able to deploy on-prem/vpc for regulated industries.

link

mikert89 181 days ago

they need to convince customers its what they need

link

serjester 181 days ago

This is a hand wavy article that dismisses away VLMs without acknowledging the real world performance everyone is seeing. I think it’d be far more useful if you published an eval.

link

mikert89 181 days ago

one or two more model releases, and raw documents passed to claude will beat whatever prompt voodoo you guys are cooking

link

holler 181 days ago

Having worked in the space I have real doubts about that. Right now Claude and other top models already do a decent job at e.g. "generate OCR from this document". But as mentioned there are serious failure modes, it's non-deterministic, and especially cost-prohibitive at scale.

link