|
|
|
|
|
by vintermann
640 days ago
|
|
I'm not a start-up, but I want to use Llama and related model to transcribe historical handwritten documents, and if possible to extract structured data from them which aren't directly visible in a word for word transcription (many of the documents are forms). I've tried many different models, but vision models are overwhelmingly oriented towards pictures rather than writing, and results aren't good. |
|
It's no replacement for OCR of printed text, of course, due to sometimes generating random text, but it looked very useful for handwritten text and all kinds of decorative fonts (e.g. "inspirational posters"). I imagine this could work:
although keep in mind that MiniCPM-V can't identify pixel positions in the image like Gemini Pro here: https://simonwillison.net/2024/Aug/26/gemini-bounding-box-vi...