Hacker News new | ask | show | jobs
by dunham 3973 days ago
I usually use "pdftotext -layout" and write python or perl code to handle the table extraction.

If I need more detailed formatting information, I use "pdftohtml -xml -fullfontname" and process the resulting xml.