Hacker News new | ask | show | jobs
by lhuser123 1236 days ago
It was my experience that OCRing scanned PDFs, would result in many small errors. For example “Alt” could be interpreted as “A|t”. Did you had those problems? How did you fixed it? What about other languages?
1 comments

I didn't build my own OCR models, in the beta I'm using tesseract but I'm going to use google or amazon when I start charging. There's no way to compete on OCR quality but I don't see other products automatically fixing doc scans, which is the value add I see my software really giving...