Hacker News new | ask | show | jobs
by 0x62 475 days ago
I’d imagine their capabilities mirror that of Mistral OCR [1]. Mistral outputs markdown, the image would have to be convertible to a reasonably useful markdown structure (charts, tables etc).

[1] https://mistral.ai/en/news/mistral-ocr

2 comments

This highlights the biggest issue I've found with Mistral OCR. Many of the documents I upload are entirely classified as images, which means no OCR is being run.

Pretty much anything with a different colored background gets returned as (image)[image_001].

Example: https://omni-demo-data.s3.us-east-1.amazonaws.com/test/17398...

LLMs tend to be a hammer in search of a nail when it comes to documents that have imagery. We decided on CV models which results in a high 90s midpoint for the docs our customers care about. If you can afford to go with a cv pipeline, it can outperform all of the LLMs by some margin.
Yes - unfortunately it seems they don't read images in the pdf.