Hacker News new | ask | show | jobs
by mythrwy 1308 days ago
There is a command line utility (pdf2text) that will also parse the pdf to an XML tree and you can query with XPaths. I found it works well.

https://pdfminersix.readthedocs.io/en/latest/reference/comma...

1 comments

That makes sense, as "pdfquery" uses pdfminer.six as a dep: https://github.com/jcushman/pdfquery/blob/master/requirement...