Hacker News new | ask | show | jobs
by divbzero 2340 days ago
Do you have recommendations for which libraries or APIs currently perform the best at extracting tables and extracting text?

Scanning the comments I see two mentions of Camelot [1] and one mention each of PDFTron [2] and ExtractTable [3].

[1]: https://camelot-py.readthedocs.io/en/master/

[2]: https://www.pdftron.com/pdf-tools/pdf-table-extraction/

[3]: https://extracttable.com/

Would love to hear if you’ve compared across multiple options.