| This is cool! Here are some other similar(?) tools, for seeing the inner contents of a PDF file (the raw objects etc), but I haven't compared them to this tool here: - https://pdf.hyzyla.dev/ - https://github.com/itext/i7j-rups (java -jar ~/Downloads/itext-rups-7.2.5.jar) - https://github.com/desgeeko/pdfsyntax (python3 -m pdfsyntax inspect foo.pdf > output.html) - https://github.com/trailofbits/polyfile (polyfile --html output.html foo.pdf) - https://www.reportmill.com/snaptea/PDFViewer/ = https://www.reportmill.com/snaptea/PDFViewer/pviewer.html (drag PDF onto it) -
https://sourceforge.net/projects/pdfinspector/ (an "example" of https://superficial.sourceforge.net/) - https://www.o2sol.com/pdfxplorer/overview.htm More? |
By way of an example. Here's an object that represents a Page. You can see the dimensions in the MediaBox. The contents themselves are contained at object "9 0 obj" ("9 0 R" is the pointer to it):
Meanwhile "9 0 obj" has the drawing instructions. They seem a little weird at first glance but you see the values ".23999999 0 0 -.23999999 0 792" each get pushed on the stack and then "cm" pops them to interpret them as the transformation matrix. The depth and detail of all of the different possible things that can be represented in a PDF is insane. But understanding the structure above is all you need to begin your journey!EDIT The rest of your journey is contained in this epic document: https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandard...