Hacker News new | ask | show | jobs
by zkanda 2104 days ago
We had a problem where we need to index and make searchable a hundred of thousands of government pdf files, some are as old as 15 years ago.

Tried a bunch libraries and settled with Tika. Although we were a PHP/Node shop, nothing could be compared to the ease of using Tika for this exact purpose.