| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hardwaresofton 1244 days ago

What are the chances... This is exactly the idea from the latest edition of Unvalidated Ideas[0][1] I released this week.

Of course I imagined a different kinds of integrations but maybe this is a case of great minds thinking alike!

[0]: https://unvalidatedideas.com/editions/latest

[1]: https://unvalidatedideas.com/editions/39

3 comments

jcuenod 1244 days ago

What are the chances? I've just started my live beta of https://fixpdfs.com to convert pdfs of scanned documents/books into better "documents" with OCR, normalized margins, etc. (for better reading, searching, and highlighting)

link

lhuser123 1244 days ago

It was my experience that OCRing scanned PDFs, would result in many small errors. For example “Alt” could be interpreted as “A|t”. Did you had those problems? How did you fixed it? What about other languages?

link

jcuenod 1244 days ago

I didn't build my own OCR models, in the beta I'm using tesseract but I'm going to use google or amazon when I start charging. There's no way to compete on OCR quality but I don't see other products automatically fixing doc scans, which is the value add I see my software really giving...

link

KeplerBoy 1244 days ago

With the small restriction that conversion from PDF is one of the few things pandoc does not do.

link

heywhatupboys 1244 days ago

yes, you obviously came up with the novel idea of PDF conversion!

link