What are the chances? I've just started my live beta of https://fixpdfs.com to convert pdfs of scanned documents/books into better "documents" with OCR, normalized margins, etc. (for better reading, searching, and highlighting)
It was my experience that OCRing scanned PDFs, would result in many small errors. For example “Alt” could be interpreted as “A|t”. Did you had those problems? How did you fixed it? What about other languages?
I didn't build my own OCR models, in the beta I'm using tesseract but I'm going to use google or amazon when I start charging. There's no way to compete on OCR quality but I don't see other products automatically fixing doc scans, which is the value add I see my software really giving...