Hacker News new | ask | show | jobs
by themanmaran 475 days ago
This highlights the biggest issue I've found with Mistral OCR. Many of the documents I upload are entirely classified as images, which means no OCR is being run.

Pretty much anything with a different colored background gets returned as (image)[image_001].

Example: https://omni-demo-data.s3.us-east-1.amazonaws.com/test/17398...

1 comments

LLMs tend to be a hammer in search of a nail when it comes to documents that have imagery. We decided on CV models which results in a high 90s midpoint for the docs our customers care about. If you can afford to go with a cv pipeline, it can outperform all of the LLMs by some margin.