| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by araghuvanshi 1248 days ago
	Thank you! We've wondered the same. There are a few useful open-source models out there (doctr, TrOCR to name a couple) but our best guess is that it comes down to the relatively lower availability of good, public OCR datasets, especially for PDFs. A quick and dirty search on paperswithcode.com shows that there are 33 OCR datasets available, out of ~7800. That said we've seen people have success with the ones I mentioned working out of the box, and I know of two folks who've fine-tuned a model to do what they need.