| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by deepsquirrelnet 309 days ago
	Give the nanonets-ocr-s model a try. It’s a fine tune of Qwen 2.5 vl which I’ve had good success with for markdown and latex with image captioning. It uses a simple tagging scheme for page numbers, captions and tables.

2 comments

davidwritesbugs 308 days ago

I've tried nanonets but it seems very sensitive to the prompt, changing it slightly turned the output to rubbish. When it worked it was pretty good.

link

deepsquirrelnet 308 days ago

This is true. It’s not meant to be run with any prompt but the one they trained with. I found that out as well. It’s only meant for ocr. Qwen 2.5vl is better if you need that option.

link

captainregex 309 days ago

I desperately wanted Qwen vl to work but it just unleashes rambling hallucinations off basic screencaps. going to try nanonet!

link