Hacker News new | ask | show | jobs
by jszymborski 748 days ago
Apache Tika could help extract the relevant bits of PDFs, couldnt it?

https://tika.apache.org/