Hacker News new | ask | show | jobs
Ask HN: What's a good library/command line tool to extract tables from PDFs?
5 points by alfarez 1104 days ago
4 comments

There's probably newer AI powered tools but Tabula is the main library I know of https://github.com/tabulapdf/tabula-java
You can use a PDF parser tool to extract data from PDF tables. I'm building parsio.io - we use pre-trained AI-powered parsers to parse PDF tables: https://parsio.io/table-extraction/. Another example us Tabula (free)
there is also this option: https://docs.ropensci.org/tabulizer/
have not tried it, but this has been in my bookmarks a while: https://github.com/camelot-dev/excalibur