|
|
|
|
|
by themanmaran
476 days ago
|
|
We also ran an OCR benchmark with LLM as judge using structured outputs. You can check out the full methodology on the repo [1]. But the general idea is: - Every document has ground truth text, a JSON schema, and the ground truth JSON. - Run OCR on each document and pass the result to GPT-4o along with the JSON Schema - Compare the predicted JSON against the ground truth JSON for accuracy. In our benchmark, the ground truth text => gpt-4o was 99.7%+ accuracy. Meaning whenever gpt-4o was given the correct text, it could extract the structured JSON values ~100% of the time. So if we pass in the OCR text from Mistral and it scores 70%, that means the inaccuracies are isolated to OCR errors. https://github.com/getomni-ai/benchmark |
|
Edit - I see it on the Benchmark page now. Woof, low 70% scores in some areas!
https://getomni.ai/ocr-benchmark