| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by agentcoops 308 days ago
	100%. My sense is that many in this thread have never gone through the misery of trying to use classical OCR for non-English documents or where you can't control scan quality. I did a test recently with 18th-century German documents, written in a well-known and standardized but archaic script. The accuracy of classical models specifically trained on this corpus was an order of magnitude lower than GPT5. I haven't experimented personally or professionally with smaller models, but your experience makes me hopeful that we might even get this accurate OCR on phones sooner rather than later...

1 comments

William Mattingly has been doing a lot of work on similar documents in an archival context with VLLMs. You should check in on their work: