Hacker News new | ask | show | jobs
by hardwaresofton 1244 days ago
What are the chances... This is exactly the idea from the latest edition of Unvalidated Ideas[0][1] I released this week.

Of course I imagined a different kinds of integrations but maybe this is a case of great minds thinking alike!

[0]: https://unvalidatedideas.com/editions/latest

[1]: https://unvalidatedideas.com/editions/39

3 comments

What are the chances? I've just started my live beta of https://fixpdfs.com to convert pdfs of scanned documents/books into better "documents" with OCR, normalized margins, etc. (for better reading, searching, and highlighting)
It was my experience that OCRing scanned PDFs, would result in many small errors. For example “Alt” could be interpreted as “A|t”. Did you had those problems? How did you fixed it? What about other languages?
I didn't build my own OCR models, in the beta I'm using tesseract but I'm going to use google or amazon when I start charging. There's no way to compete on OCR quality but I don't see other products automatically fixing doc scans, which is the value add I see my software really giving...
With the small restriction that conversion from PDF is one of the few things pandoc does not do.
yes, you obviously came up with the novel idea of PDF conversion!