Hacker News new | ask | show | jobs
by seszett 1016 days ago
Although this is an interesting dive into the PDF format, just opening the PDF in Libreoffice or Inkscape usually works fine to modify its text.
2 comments

I’m interested in extracting the contents of a pdf form — many individual text boxes. You’re saying libre office would likely be able to parse that pdf into a usable format?
Poppler ( https://poppler.freedesktop.org/ ) handles this for you with pdftotext utility. It also ships with bunch of other utilities to work with PDFs
With LibreOffice Draw you can edit the PDF (modify the text, move or change images, etc), then save as pdf, but it can't parse and save it as .odt, .doc, .html or similar.
LibreOffice has some really perplexing functionality gaps.

The one that baffles me is that it doesn't understand its own graphics format, so you have to export drawings to TIFF or something (if I remember correctly).

Pdfmaster is a good tool for this too but the free version leaves a watermark