| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by souvik3333 288 days ago

We have developed DocStrange to create LLM-ready data from images and PDFs. We have open-sourced a 3B finetuned model also. You can try both the open-sourced and private models from the demo.

HF: https://huggingface.co/nanonets/Nanonets-OCR2-3B Demo: https://docstrange.nanonets.com/ Blog: https://nanonets.com/research/nanonets-ocr-2/

This model is an improvement over our last open-source model. We have fixed some of the issues that the community faced and some of the features that were requested (handwritten, multi-lingual).

The models are trained on 3 million documents, including handwritten documents, financial reports, complex tables, documents with watermarks, and stamps. Feel free to try it and share feedback.

1 comments

AdityaNahata 288 days ago

Do you guys provide api support also? I am processing documents for a project

link

souvik3333 288 days ago

Yeah, we do have api support. Currently, you can process 10k documents per month free. Let me know if you face any issues.

link