| HN Mirror

As i see, OCR is like taking all the pieces of a jigsaw puzzle and putting them back together. So, the key is whether you can understand the picture.

The traditional OCR model can't understand context, and that's what cloud LLM did.

BTW, some local models are already quite good like MinerU. But it's too heavy to run on consumer-grade PC. So it's just hard to strike a balance.