| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cdolan 476 days ago

were you guys able to finish running the benchmark with mistral and got a 70% score? Missed that

Edit - I see it on the Benchmark page now. Woof, low 70% scores in some areas!

https://getomni.ai/ocr-benchmark

1 comments

themanmaran 476 days ago

Yup, surprising results! We were able to dig in a bit more. Main culprit is the overzealous "image extraction". Where if Mistral classifies something as an image, it will replace the entire section with (image)[image_002).

And it happened with a lot of full documents as well. Ex: most receipts got classified as images, and so it didn't extract any text.

link

cdolan 476 days ago

This sounds like a real problem and hurdle for North American (US/CAN in particular) invoice and receipt processing?

link

lingjiekong 476 days ago

where do you find this regarding "Where if Mistral classifies something as an image, it will replace the entire section with (image)[image_002)."?

link

culi 476 days ago

themanmaran works at Omni so presumably they have access to the actual resulting data from this study

link