| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 0x62 475 days ago
	I’d imagine their capabilities mirror that of Mistral OCR [1]. Mistral outputs markdown, the image would have to be convertible to a reasonably useful markdown structure (charts, tables etc). [1] https://mistral.ai/en/news/mistral-ocr

2 comments

themanmaran 475 days ago

This highlights the biggest issue I've found with Mistral OCR. Many of the documents I upload are entirely classified as images, which means no OCR is being run.

Pretty much anything with a different colored background gets returned as (image)[image_001].

Example: https://omni-demo-data.s3.us-east-1.amazonaws.com/test/17398...

link

mtillman 475 days ago

LLMs tend to be a hammer in search of a nail when it comes to documents that have imagery. We decided on CV models which results in a high 90s midpoint for the docs our customers care about. If you can afford to go with a cv pipeline, it can outperform all of the LLMs by some margin.

link

bilater 474 days ago

Yes - unfortunately it seems they don't read images in the pdf.

link