Hacker News new | ask | show | jobs
by barfbagginus 806 days ago
Can I send a PR extending the benchmark against doctr and potentially textract? I believe these represent the SOTA for open and proprietary OCR.

The benefit is to let people evaluate surya against the open source and commercial SOTA, improving the integrity and applicability of the benchmark in a business or research setting.

There's a risk: it could make surya's benchmark look less attractive. Also, picking textract to represent the proprietary SOTA might be dicey, since it has competitors (Google cloud ocr, Azure ocr)

Still, ranking surya with doctr, textract, and tesseract would be really nice baseline. As a research user, business user or open source contributor, those are the results I need to quickly understand surya's potential.

1 comments

I've benchmarked against google cloud ocr, but the results are on Twitter, not the repo yet - https://twitter.com/VikParuchuri/status/1765440195124691339 . The reason I didn't benchmark against doctr is language support.