Hacker News new | ask | show | jobs
by bambax 473 days ago
Where did you test it? At the end of the post they say:

> Mistral OCR capabilities are free to try on le Chat

but when asked, Le Chat responds:

> can you do ocr?

> I don't have the capability to perform Optical Character Recognition (OCR) directly. However, if you have an image with text that you need to extract, you can describe the text or provide details, and I can help you with any information or analysis related to that text. If you need OCR functionality, you might need to use a specialized tool or service designed for that purpose.

Edit: Tried anyway by attaching an image; it said it could do OCR and then output... completely random text that had absolutely nothing to do with the text in the image!... Concerning.

Tried again with a better definition image, output only the first twenty words or so of the page.

Did you try using the API?

1 comments

Yes I used the API. They have examples here:

https://docs.mistral.ai/capabilities/document/

I used base64 encoding of the image of the pdf page. The output was an object that has the markdown, and coordinates for the images:

[OCRPageObject(index=0, markdown='![img-0.jpeg](img-0.jpeg)', images=[OCRImageObject(id='img-0.jpeg', top_left_x=140, top_left_y=65, bottom_right_x=2136, bottom_right_y=1635, image_base64=None)], dimensions=OCRPageDimensions(dpi=200, height=1778, width=2300))] model='mistral-ocr-2503-completion' usage_info=OCRUsageInfo(pages_processed=1, doc_size_bytes=634209)

Any luck with this? I'm trying to process photos of paperwork (.pdf, .png) and got the same results as you.

Feels like something is missing in the docs, or the API itself.

https://imgur.com/a/1J9bkml