Hacker News new | ask | show | jobs
by alkou 4730 days ago
do you use pdftotext internally or something else?
1 comments

we have our own parser for more complicated task and use xpdf when speed is key because it's much faster.
I am writing something similar for a client. He needs data in tables extracted from the PDF. Which language are you using? I wrote two scripts, one using python and pdftotext and another using ruby pdf-reader, the ruby one gives each line of the PDF one by one which is good for extraction.