| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jjohansson 2371 days ago

PDFTron provides an SDK and isn't really meant as a plug-and-play end-user application. But it can accomplish what you're looking for.

Here's how to extract text from a PDF based on coordinates (this explains how to do it on web, but it's also possible using other platforms):

Here's how to extract a PDF's logical structure: