| HN Mirror

I was pleasantly surprised by the "OCR" results MiniCPM-V 2.6 gives on any kind of text, including handwritten, given an image and trivial prompt. I'll be sure to keep an eye out on this family of models.

It's no replacement for OCR of printed text, of course, due to sometimes generating random text, but it looked very useful for handwritten text and all kinds of decorative fonts (e.g. "inspirational posters"). I imagine this could work:

  * if you're going to check the output manually or

  * somehow make it part of a pipeline where this model recognizes the rough layout of the page and to get reliable text you cut it up and run traditional OCR on the blocks or

  * somehow diff the VLM output and the OCR tool output

although keep in mind that MiniCPM-V can't identify pixel positions in the image like Gemini Pro here: https://simonwillison.net/2024/Aug/26/gemini-bounding-box-vi...