Hacker News new | ask | show | jobs
by eschaton 1014 days ago
Anybody trying to do this is missing the point of PDF: It’s a page-description format and therefore only represents the marks on a page, not document structure.

One should not attempt to edit a PDF, one should edit the document from which the PDF is generated.

5 comments

I'll stop trying to edit PDFs when people stop sending me PDFs that I want to edit.

Somehow it became "unprofessional" to just send meant-to-be-editable documents around for everyone to enjoy, so this is where we end up...

It isn't "somehow" there are some legitimate reasons why.

When I send a client a final document (that's not intended to be edited) in a .PDF format you can almost guarantee that it will look the same to them as it did for me. When I send someone a Word document, I can't guarantee that it will look the same between different versions of Word, Mac Word, Pages, Google doc etc.

I'm not saying .PDF formats are perfect, but they're certainly more consistently presented to the end user.

> When I send someone a Word document, I can't guarantee that it will look the same

Exactly! Isn't that wonderful? People can view it on a desktop, a phone, a text-to-speech engine if they want to have it in audio form... it all just works because the tags are preserved that show what's supposed to be a heading, where new paragraphs start, whether text to the right is a side box or a continuation of the line, etc.

> certainly more consistently presented to the end user

   We all love horizontal scrolling to read your sentences, yes. Let's make everything preformatted text with a line length chosen by the sender. (Yes I know HN now inserts line breaks in code blocks now. Isn't it nice to have reflowed text?)
Its that age old problem where people send documents, that need to be edited by the recipient in a structured way.

Same thing goes on with Word docs being sent out, formatted in a particular way like PDF's, ie a questionnaire, and the recipient needs to edit said document and then send it back.

HTML forms are other examples.

All these years later, still no globally standard way to achieve this quickly and easily, and yet it would seem perfect for the Open Source world to tackle.

Yes, This is exactly the source of the problem. Otherwise editing would rarely be a problem or even needed. And for this it is extremely ill suited. I remember having printed a document, typed by hand and scan it back in a pressing situation. It felt bizantinely complicated and equally frustrating.
PDF does support incorporating information about the logical document structure, aka Tagged PDF. It’s optional, but recommended for accessibility (e.g. PDF/UA). See chapters 14.7–14.8 in [1].

[1] https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandard...

"should not" is meaningless here, because in the real world there are tons of situations where people want you to edit PDF, one way or another
One challenge is marking up corrections that need to go back to the source document. I get proofs from a typesetter, and I need to mark it up for them to fix. I can't change the pdf text, because the typesetter won't see that. Acrobat's markup tools aren't terrible, but they aren't quite what I could do in the days of paper and red pencils. Unless I use the 'pencil' tool in Acrobat. I'd like to see that improved.
> It’s a page-description format and therefore only represents the marks on a page, not document structure

Maybe they should have called it ‘Page Description Format’ then? Instead of ‘Portable Document Format’