|
|
|
|
|
by araghuvanshi
1200 days ago
|
|
Thank you! We've wondered the same. There are a few useful open-source models out there (doctr, TrOCR to name a couple) but our best guess is that it comes down to the relatively lower availability of good, public OCR datasets, especially for PDFs. A quick and dirty search on paperswithcode.com shows that there are 33 OCR datasets available, out of ~7800. That said we've seen people have success with the ones I mentioned working out of the box, and I know of two folks who've fine-tuned a model to do what they need. |
|