Hacker News new | ask | show | jobs
by hakunin 12 days ago
A SQLite based sweeper of all the scans, notes, PDFs and images I have on my filesystem, that stores their paths and allows searching their OCR’ed descriptions and text, as provided by Mistral OCR. I can ask things like “when does my car need maintenance” or “find me that picture my kid drew for Mother’s Day”. I use pi-based bash executable to launch a doc chat like that. https://github.com/maxim/ringbinder
2 comments

Nice! I did something similar for myself but fully offline. It started because I had to do a tax return and collecting information was a pain point.
Yeah, for me it was all about not letting mail and kids schoolwork pile up. I like to scan ads that local businesses leave in my mailbox, so that I can ask "show me lawn care services near me". A lot of them don't really have any other online presence.

Btw, I tried to keep the Mistral part modular, so that another OCR could be integrated.

how Mistral OCR works for you? Is it really better than Tesseract?
IME way better. It may not be the best out there, but it's cheap (2c per page), fast, easy to integrate API, and sufficient for my needs. It does things like describe what's drawn in pictures and shown in graphs, which all helps when searching.
great! have you tested how well it return coordinates of objects/text? Ive tried with generic LLMs like Gemini/Qwen/Gemma and they all are unstable with coordinates around text, better when using visual grounding though
Yeah, for perfect positioning/overlaying I would be much stricter with my requirements. For that type of OCR I used Apple’s own LiveText framework that comes with MacOS. But in this use case I only care about standalone plain text and descriptive text to store in the database, not overlay over original content, so never tested Mistral on that front.