Hacker News new | ask | show | jobs
by deepsquirrelnet 309 days ago
Give the nanonets-ocr-s model a try. It’s a fine tune of Qwen 2.5 vl which I’ve had good success with for markdown and latex with image captioning. It uses a simple tagging scheme for page numbers, captions and tables.
2 comments

I've tried nanonets but it seems very sensitive to the prompt, changing it slightly turned the output to rubbish. When it worked it was pretty good.
This is true. It’s not meant to be run with any prompt but the one they trained with. I found that out as well. It’s only meant for ocr. Qwen 2.5vl is better if you need that option.
I desperately wanted Qwen vl to work but it just unleashes rambling hallucinations off basic screencaps. going to try nanonet!