|
|
|
|
|
by aidos
159 days ago
|
|
You can replace objects in PDF documents. A PDF is mostly just a bunch of objects of different types so the readers know what to do with them. Each object has a numbered ID. I recommend mutool for decompressing the PDF so you can read it in a text editor: mutool clean -d in.pdf out.pdf
If you look below you can see a Pages list (1 0 obj) that references (2 0 R) a Page (2 0 obj). 1 0 obj
<<
/Type /Pages
/Count 1
/Kids [ 2 0 R ]
>>
endobj
2 0 obj
<<
/Type /Page
/Contents 5 0 R
...
>>
endobj
Rather than editing the PDFs in place, it's possible to update these objects to overwrite them by appending a new "generation" of an object. Notice the 0 has been incremented to a 1 here. This allows leaving the original PDF intact while making edits. 1 1 obj
<<
/Type /Pages
/Count 2
/Kids [ 2 0 R 200 0 R ]
>>
endobj
You can have anything inside a PDF that you want really and it could be orphaned so a PDF reader never picks up on it. There's nothing to say an object needs to be referenced (oh, there's a "trailer" at the end of the PDF that says where the Root node is, so they know where to start). |
|
So it works kind of like a soft delete — dereference instead of scrubbing the bits.
Is this behavior generally explicitly defined in PDF editors (i.e. an intended feature)? Is it defined in some standard or set of best practices? Or is it a hack (or half baked feature) someone implemented years ago that has just kind of stuck around and propagated?