| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chad1n 475 days ago
	These "OCR" tools who are actually multimodals are interesting because they can do more than just text abstraction, but their biggest flaw is hallucinations and overall the nondeterministic nature. Lately, I've been using Gemini to turn my notebooks into Latex documents, so I can see a pretty nice usecase for this project, but it's not for "important" papers or papers that need 100% accuracy.

1 comments

thelittleone 475 days ago

How about building a tool which indexes ocr chunks / tokens and a confidence grading. Setting a tolerance level and defining actions where the token or chunk (s) fall below that level. Actions could include could include automated verification using another model or last resort human.

link

Eisenstein 474 days ago

How would you calculate the confidence? LLMs are notoriously bad at grading their own output.

link