Hacker News new | ask | show | jobs
by jcuenod 988 days ago
Good idea. What if it varies in quality based on different languages? I read a lot of texts with English, French, German, Greek, and Hebrew. The quality of the OCR goes in roughly that order. If there's enough Hebrew, it gets detected reasonably (but vowel points are less reliable). If there's not enough Hebrew, it might not get detected at all.

I'm using Google's OCR api on the backend after using my secret sauce to "fix" the scan. I've found it to work better than tesseract, and heard that it's better than azure or aws. Google claims to support a ton of languages. But obviously, quality varies... I can't just copy their claim because it doesn't reflect my experience of reality, so what do I say?