Y
Hacker News
new
|
ask
|
show
|
jobs
Ask HN: What's a good library/command line tool to extract tables from PDFs?
5 points
by
alfarez
1104 days ago
4 comments
UglyToad
1104 days ago
There's probably newer AI powered tools but Tabula is the main library I know of
https://github.com/tabulapdf/tabula-java
link
andrewio
1102 days ago
You can use a PDF parser tool to extract data from PDF tables. I'm building parsio.io - we use pre-trained AI-powered parsers to parse PDF tables:
https://parsio.io/table-extraction/
. Another example us Tabula (free)
link
phiv
1098 days ago
there is also this option:
https://docs.ropensci.org/tabulizer/
link
phiv
1103 days ago
have not tried it, but this has been in my bookmarks a while:
https://github.com/camelot-dev/excalibur
link