Hacker News new | ask | show | jobs
by gettalong 1303 days ago
As others have already written, there are many slightly invalid PDF files out there in the wild that many readers can display mostly fine and which your library should also be able to handle.

If you can, grab yourself a copy of the most recent PDF 2.0 specification since it contains much more information and is much more correct in terms of how to implement things. Also have a look at the errata at https://pdf-issues.pdfa.org/32000-2-2020/index.html.

As I'm implementing a PDF library (in Ruby), I have started to collect some situations that arise in the wild but are not spec-compliant, see https://github.com/gettalong/annotated-pdf-spec. That might help you in parsing some invalid PDFs

1 comments

Merely for your consideration, if those were actual issues on that repo, (a) it would allow adding labels to them (as in https://github.com/pdf-association/pdf-issues/issues?q=is%3A... ) (b) folks could comment, acting as a low-rent stackoverflow, and (c) it would allow anyone to contribute new ones versus the "PR against README.md" situation right now

That also more closely matches the mental model of those items: bugs against the specification, whether the official PDF Association agrees that they are or not