Hacker News new | ask | show | jobs
by binyang_qiu 40 days ago
I really like your local-first idea. Just one question: what do you think is still the biggest weakness of local OCR models compared with cloud APIs? Is it layout parsing, reading order, or messy scans?
1 comments

As i see, OCR is like taking all the pieces of a jigsaw puzzle and putting them back together. So, the key is whether you can understand the picture.

The traditional OCR model can't understand context, and that's what cloud LLM did.

BTW, some local models are already quite good like MinerU. But it's too heavy to run on consumer-grade PC. So it's just hard to strike a balance.