Hacker News new | ask | show | jobs
by verdverm 652 days ago
I've been using several Python libraries for working with PDFs. At least one of them allows you to walk the AST. (will look up in a bit and edit this comment)
1 comments

I've been using pypdf for working with PDFs in Python. My uses are pretty humble. I create Jupyter notebooks for managing sheet music that I receive in PDF format, allowing me to do things like break up a book of tunes into individual files, and so forth. This in turns makes it easier to pull up individual tunes on my tablet during a performance. But it looks like you can treat the PDF as a tree structure. I've used that feature for writing some recursive functions.
yeah, I've been using pypdf mainly, camelot-py for some table stuff, and a bit of pdfminer

I've been needing something to see the x/y bounds of tables to fix some edge cases with camelot, seem to be some good links in the comments here