Hacker News new | ask | show | jobs
by booder1 397 days ago
5.5.0 released November last year. Still a very active project as far as I can tell and runs on CPU. Even compared to best open source GPU option it is still pretty good. VLMs work very differently and don't work as well for everything. Why is it out of date?
1 comments

I don't know that that is true: https://researchify.io/blog/comparing-pytesseract-paddleocr-...

Using Surya gets you significantly better results and makes almost all the work detailed in the article largely unnecessary.

Surya weights for the models are licensed cc-by-nc-sa-4.0 so not free for commercial usage. Also, as far as I know, the training data is 100% unavailable. Given they use well trained, but standard models, it isn't really open source and barely, maybe, open weight. I kinda hate how their repo says gpl cause that is only true for the inference code. The training code is closed source.
I did not know that the training code is closed source. That is troubling.