Here's how to extract text from a PDF based on coordinates (this explains how to do it on web, but it's also possible using other platforms):
https://groups.google.com/d/msg/pdfnet-webviewer/h2W3VksbQUI...
Here's how to extract a PDF's logical structure:
https://www.pdftron.com/documentation/samples/#logicalstruct...