|
|
|
|
|
by LegionMammal978
499 days ago
|
|
If you're interested in manipulating PDFs, I've found QPDF [0] to be a useful tool. Its "QDF mode" lays out the objects in a form where you can directly edit them, and it can automatically fix up the xref table afterwards. It can also convert to and from a JSON format that you can manipulate with your own scripts. [0] https://github.com/qpdf/qpdf, https://qpdf.readthedocs.io/en/stable/ |
|
https://qpdf.readthedocs.io/en/stable/json.html
https://www.jsonify.org
https://github.com/maximoguerrero/PDF-GPT4-JSON
PDF is such a curious format. It's not human-readable, it's not well-structured, it's not small. If it weren't for momentum and the political horse trading that Apple, Adobe and Microsoft were doing when the web went mainstream and freaked them out around 1995, I'm not sure that we'd be using it today. Postscript is better in countless ways, but since it's Turing-complete, it's not really ideal for storing static data, and to my knowledge was never extended to handle binary data well, like for embedded JPEGs. I remember trying to print a 10 MB ps file in the 1990s and it took maybe 20 minutes because the grayscale image was basically represented as a bunch of run-length encoded scan lines.
I would argue that frontend web development has reached a similar fate. It seems odd to use programming language (imperative, no less) to design media that we used to describe declaratively. If I had enjoyed success in my programming career, I would work on a declarative representation of HTML/CSS/Javascript that can represent the intersection of all existing markup across all mainstream browsers. Sort of like a mix between Markdown and CSS flexbox like Xcode's auto layout, but universal. It frankly would probably look like HTML, but with sane defaults/builtins/inheritance, as well as a way define and extend components from the beginning, similarly to how people try to use data attributes. For contrast, React and Vue come at this from the opposite direction. I'm talking about something more like htmx.
Then we could work with that format and transpile to HTML or even React Native and dump 90-99% of the boilerplate and build tooling that we use currently.