|
|
|
|
|
by themanmaran
472 days ago
|
|
Excited to test this our on our side as well. We recently built an OCR benchmarking framework specifically for VLMs[1][2], so we'll do a test run today. From our last benchmark run, some of these numbers from Mistral seem a little bit optimistic. Side by side of a few models: model | omni | mistral | gemini | 86% | 89% | azure | 85% | 89% | gpt-4o | 75% | 89% | google | 68% | 83% | Currently adding the Mistral API and we'll get results out today! [1] https://github.com/getomni-ai/benchmark [2] https://huggingface.co/datasets/getomni-ai/ocr-benchmark |
|
Mistral OCR:
- 72.2% accuracy
- $1/1000 pages
- 5.42s / page
Which is pretty far cry from the 95% accuracy they were advertising from their private benchmark. The biggest thing I noticed is how it skips anything it classifies as an image/figure. So charts, infographics, some tables, etc. all get lifted out and returned as [image](image_002). Compared to the other VLMs that are able to interpret those images into a text representation.
https://github.com/getomni-ai/benchmark
https://huggingface.co/datasets/getomni-ai/ocr-benchmark
https://getomni.ai/ocr-benchmark