Hacker News new | ask | show | jobs
by recuter 1300 days ago
> Source: I work in developing a competing OCR service and we keep an eye on competition (e.g. aside from Google, solutions by Azure, Amazon, Abbyy, Nuance, Cloudmersive, etc., as well as our internal product of course, which is not available externally), and they are (almost) all significantly better on Tesseract.

Great. How do you quantify it and keep track? Is there an industry standard benchmark?

Would you consider sharing a backblaze type analysis (they track consumer HD performance and blogging about it got them a lot of attention and customers)?

1 comments

Sorry for the late answer.

Short answer is: we can't and we don't. Most EULAs explicitly prevent users to benchmark results, and we don't want to incur into any such risk. Plus, since we develop a competing product, any "deep look" into the competition might be seen as reverse engineering it, and our company is very careful to avoid such problems.

Our company has dedicated teams to evaluate competition products, so we once asked them (a couple of years ago), and could only look at aggregated, anonymized results. But the patterns were very clear. Anecdotical experience (mostly coming from customers of ours who, themselves, compare our internal engine with alternatives) seemed to point to the fact that most of the competition have rather stable service, so quality likely didn't evolve much in the last two years, but we can't be sure of course.

We constantly track our own accuracy on internally developed benchmarks, because frankly the ones available online (also for research purposes) are very bad. But as said, we can only continuously test our own engine and open source ones (like Tesseract), for legal reasons.

Thank you kindly. :)