| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kapitalx 472 days ago

Yes I used the API. They have examples here:

https://docs.mistral.ai/capabilities/document/

I used base64 encoding of the image of the pdf page. The output was an object that has the markdown, and coordinates for the images:

[OCRPageObject(index=0, markdown='![img-0.jpeg](img-0.jpeg)', images=[OCRImageObject(id='img-0.jpeg', top_left_x=140, top_left_y=65, bottom_right_x=2136, bottom_right_y=1635, image_base64=None)], dimensions=OCRPageDimensions(dpi=200, height=1778, width=2300))] model='mistral-ocr-2503-completion' usage_info=OCRUsageInfo(pages_processed=1, doc_size_bytes=634209)

1 comments

sadcrab 472 days ago

Any luck with this? I'm trying to process photos of paperwork (.pdf, .png) and got the same results as you.

Feels like something is missing in the docs, or the API itself.

https://imgur.com/a/1J9bkml

link