| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wahnfrieden 246 days ago
	Do any LLM OCRs give bounding boxes anyway? Per character and per block.

2 comments

kelvinjps10 246 days ago

Gemini does but it's not as good as Google vision, and the format it's différent Here it's the documentation https://cloud.google.com/vertex-ai/generative-ai/docs/boundi...

Also Simon Willison Made a blog post that might be helpful https://simonwillison.net/2024/Aug/26/gemini-bounding-box-vi...

I hope that this capability improves so I can use only Gemini API.

link

dajonker 244 days ago

Try MinerU 2.5 with two-step parsing. It gives good results with bounding boxes per block. Not sure if you can get it to do more detailed such as word or character level.

link