Hacker News new | ask | show | jobs
by rstuart4133 2064 days ago
I don't know of a better alternative to PDF that was around at the time, but I can't say I'm a fan. It undeniably works well as a way of placing pixels precisely on a page but then so does PNG, and PNG is far simpler and compresses better for computer generated content.

Sadly some information I only get as PDF's, so I have to scrape them. Easy right? It can be, if the PDF is structured sanely. But PDF isn't some well defined data structure for laying out the page, it's a Turing complete stack based computer program that can do whatever it damned well pleases. The font tables don't necessarily have ' '=32, 'A'=65, 'a'=97. Why not optimise it and get rid of all those gaps, so now ' '=0, 'A'=30? And it doesn't have to be drawn in any sane order. It can be just a mess that makes even copy & paste near impossible, and some are.

Did we really need to invent a DSL that has to be executed every time we wanted to view page? I remember it being pushed as a cool solution at the time. It doesn't look so cool now. SVG would be an improvement.

2 comments

PNG doesn't support multiple pages and didn't supplant GIF until 2000 or so. TIFF does, but in practice it's always uncompressed (did it even support compression in the 90s?). Either solution didn't allow for text blocks or vector zooming or form fields.

It's not difficult to improve upon a sane subset of PDF, but that would require backing and coordination. Reviving XPS (but not under MS auspices) should also be possible.

PNG is of course an image format, and that means it doesn't really do text well. (Oh, and PDF is fully Turing complete and can even execute JavaScript, to in some contexts calling it a DSL is straining the definition a bit.)