| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by raincole 246 days ago
	If you can accept that the machine just make up what it doesn't recognize instead of saying "I don't know," then yes it's solved. (I'm not being snarky. It's acceptable in some cases.)

4 comments

jakewins 246 days ago

But this was very much the case with existing OCR software as well? I guess the LLMs will end up making up plausible looking text instead of text riddled with errors, which makes it much harder to catch the mistakes, in fairness

link

wahnfrieden 246 days ago

Existing ocr doesn’t skip over entire (legible) paragraphs or hallucinate entire sentences

link

criddell 246 days ago

I usually run the image(s) through more than one converter then compare the results. They all have problems, but the parts they agree on are usually correct.

link

Davidzheng 246 days ago

rarely happens to me using LLMs to transcribe pdfs

link

KoolKat23 246 days ago

This must be some older/smaller model.

link

rkagerer 246 days ago

Good libraries gave results with embedded confidence levels for each unit recognized.

link

red75prime 246 days ago

Just checked it with Gemini 2.5 Flash. Instructing it to mark low-confidence words seems to work OK(ish).

link

KoolKat23 246 days ago

These days it does just that, it'll say null or whatever if you give it the option. When it does make it up, it tends to be limitation of the image qualify ( max dpi).

Blotchy text and specific typeface make 6's look like 8's, even to the non-discerning eye, a human would think it's an 8, zoom in and see it's a 6.

Google's image quality on uploads is still streets ahead of openai for instance btw.

link

wahnfrieden 246 days ago

Do any LLM OCRs give bounding boxes anyway? Per character and per block.

link

kelvinjps10 246 days ago

Gemini does but it's not as good as Google vision, and the format it's différent Here it's the documentation https://cloud.google.com/vertex-ai/generative-ai/docs/boundi...

Also Simon Willison Made a blog post that might be helpful https://simonwillison.net/2024/Aug/26/gemini-bounding-box-vi...

I hope that this capability improves so I can use only Gemini API.

link

dajonker 244 days ago

Try MinerU 2.5 with two-step parsing. It gives good results with bounding boxes per block. Not sure if you can get it to do more detailed such as word or character level.

link