Hacker News new | ask | show | jobs
by TZubiri 483 days ago
The issue with that promise is that anyone can convert pdfs, the question is whether the conversions are correct or whether you have

Income Expenses 200 100

On one document, and

Income Expenses 20 0100

On others.

There's no shortage of products that tried to solve this problem from scratch (or by piggybacking on other projects) and called it a day without worrying about the huge problem that is quality and parseability.

The most robust players just give you the coordinates of a glyph and you are on your own: Textract, PDFBox.