| HN Mirror

You can get a long way with only implementing the most basic things of the PDF specification, like section 7. And even there you don't need everything. For example, there is no need to implement the CCITTFaxDecode, JBIG2Decode, DCTDecode or JPXDecode filters if you don't want to get at the raw pixels of the images.

Once you have parsing and writing of a simple PDF file going (sections 7.2, 7.3, 7.4, 7.5, 7.7), add in support for encryption (section 7.6). Now you are able to handle to at least parse and write nearly all PDF files.

Then implement all the things you need gradually For example:

* Need support for parsing or creating the contents of a page? -> sections 7.8, 8, and 9. Mind you, start out with only supporting the built-in PDF fonts for creating text and later add support for TrueType (easier) and OpenType (harder if you need to implement the font parser yourself).

* Need support for annotations? -> section 12.5

And so on.

If you just need to store the metadata in the PDF, you only need support for parsing and writing a PDF because this usually also entails that you can modify the PDF object tree which is needed for storing the metadata. However, if you need to store that metadata in a way that is usable for other PDF processors, you would need to store it as an XMP file and creating that is yet another deep dive if you don't have an XMP library available. See section 14.3.2 in the PDF spec for this (btw. the latest PDF spec is available at no cost at https://pdfa.org/resource/iso-32000-2/).