Hacker News new | ask | show | jobs
by trez 4737 days ago
we have our own parser for more complicated task and use xpdf when speed is key because it's much faster.
1 comments

I am writing something similar for a client. He needs data in tables extracted from the PDF. Which language are you using? I wrote two scripts, one using python and pdftotext and another using ruby pdf-reader, the ruby one gives each line of the PDF one by one which is good for extraction.