|
|
|
|
|
by cxie
472 days ago
|
|
The new Mistral OCR release looks impressive - 94.89% overall accuracy and significantly better multilingual support than competitors. As someone who's built document processing systems at scale, I'm curious about the real-world implications. Has anyone tried this on specialized domains like medical or legal documents? The benchmarks are promising, but OCR has always faced challenges with domain-specific terminology and formatting. Also interesting to see the pricing model ($1/1000 pages) in a landscape where many expected this functionality to eventually be bundled into base LLM offerings. This feels like a trend where previously encapsulated capabilities are being unbundled into specialized APIs with separate pricing. I wonder if this is the beginning of the componentization of AI infrastructure - breaking monolithic models into specialized services that each do one thing extremely well. |
|
From our last benchmark run, some of these numbers from Mistral seem a little bit optimistic. Side by side of a few models:
model | omni | mistral |
gemini | 86% | 89% |
azure | 85% | 89% |
gpt-4o | 75% | 89% |
google | 68% | 83% |
Currently adding the Mistral API and we'll get results out today!
[1] https://github.com/getomni-ai/benchmark
[2] https://huggingface.co/datasets/getomni-ai/ocr-benchmark