Hacker News new | ask | show | jobs
by BasHamer 2759 days ago
https://pdftables.com failed the test file, pretty good but inconsistent interpretation across rows, sometimes it split the cell, sometimes it did not. Tabula failed to detect multi-line rows, after manually changing the table it did do better than pdftables.com on splitting cells. Both failed the non-printable whitespace characters that created garbled outputs in the excel. The other one would take some time to rig up.
1 comments

You can also try https://docparser.com/.

If nothing works for you and you're comfortable with sharing an example file, you can send it to me and I could take a look.