Hacker News new | ask | show | jobs
by gcanyon 1016 days ago
I’m interested in extracting the contents of a pdf form — many individual text boxes. You’re saying libre office would likely be able to parse that pdf into a usable format?
2 comments

Poppler ( https://poppler.freedesktop.org/ ) handles this for you with pdftotext utility. It also ships with bunch of other utilities to work with PDFs
With LibreOffice Draw you can edit the PDF (modify the text, move or change images, etc), then save as pdf, but it can't parse and save it as .odt, .doc, .html or similar.
LibreOffice has some really perplexing functionality gaps.

The one that baffles me is that it doesn't understand its own graphics format, so you have to export drawings to TIFF or something (if I remember correctly).