Hacker News new | ask | show | jobs
by MichalSternik 1803 days ago
Well, what's wrong with static site (generators)?

I certainly get the argument, but using something like hugo or gatsby or jekyll when you want to avoid the "churn" also seems like a perfectly valid solution.

3 comments

The author addresses this pretty well. Because you can embed whatever you want, static site generators aren't really static. In particular, Jekyll blogs and what not still pretty commonly include comment sections.

Of course, pdfs aren't necessarily static, either, but that is why Lab6 is choosing to use pdf/a, an actually static format intended specifically for long-term archiving of immutable files. This way you can sign the file and guarantee it stays the same forever and everyone's copy is identical.

I'm kind of surprised at the response to this. The author seems well aware of how terrible pdf is as a format and this isn't some treatise of why we should want to use it. It's an unfortunate compromise that, given the requirements they're aiming to meet, of generating a file that supports rich formatting and hyperlink embedding, but which can guarantee immutability and long-term archiving directly in the spec, pdf/a is all there is, so in spite of being a terrible format with a lot of shortcomings, it's what they're using.

Why don't they just use a static subset of HTML? You don't have to include comments sections, just like you don't have to include 3D CAD models and videos in your PDFs (yes you can do both of those, in theory anyway).
> The author addresses this pretty well. Because you can embed whatever you want, static site generators aren't really static. In particular, Jekyll blogs and what not still pretty commonly include comment sections.

But just like you can choose to use PDF/A, you can also choose to have a completely static and self-contained (e.g. using data URLs for images) HTML page.

> pdf/a is all there is

Nobody is requiring you to use PDF/A. No mainline browser (that I'm aware of) requires it.

So what is being solved? When I click on a PDF on the web, I don't know if it's using PDF/A, I don't know if it's embedding or linking its fonts. So it's the same situation, nothing has changed.

Telling people to use PDF/A when most clients do not enforce it and when there's no indication to users before they click on a link whether or not the link is following the spec -- it is exactly the same as telling them to use a subset of HTML; the author is doing the same thing they complain about.

You can't just say that PDF/A exists. That's not enough, how will you get people to restrict themselves to that format when 99% of their users will never notice the difference and no client is enforcing it?

The only thing I like about PDF compared to HTML is that with PDF, I know for a fact that no web requests are made in the background. That means no fingerprinting, no analytics etc.

With HTML, I have to trust that some random entity does what they state in their privacy policy, and they regularly don't. Sure, I can disable JS, but then 95% of the web doesn't work anymore.

Other than that PDF is quite clearly a less accessible format.

How do you know for a fact? PDF has JS in the spec, and it supports SOAP and Web Services. Have a look at https://www.adobe.com/go/acrobatsdk_jsdevguide
That's not the PDF spec is it? That is a spec for Adobe Acrobat, which is not allowed to make any web requests thanks to my application firewall (Little Snitch).

Pretty sure a PDF opened in the browser can't run any JS, but not completely sure. So you're right: I don't really know it for a fact. Poor choice of words.

The spec is ISO 32000, and it’s expensive and closed, so difficult to reference. But according to Wikipedia at least, JavaScript is normative in it. No idea if SOAP / Web Services is part of it though.
The spec for PDF 1.7 is here: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PD...

JavaScript is allowed, but not in PDF/A, which is what I use.

The PDF 2.0 spec is damnably not public.

But you can't easily tell PDF/A and regular PDF apart, so we're back to the same situation as HTML vs. HTML with javascript turned off.
Are you sure? I was under the impression that PDFs can reference web resources, and this is why there are more stringent standards for archiving (PDF/A and friends)
> With HTML, I have to trust that some random entity does what they state in their privacy policy, and they regularly don't. Sure, I can disable JS, but then 95% of the web doesn't work anymore.

If you only allow PDF, then 99.9999% of the web doesn't work anymore.

I'm all for getting sites to be static, but PDF doesn't fix that because the problem has never been the technology used to build the site.

How sure are you that there are no network requests happening? I tried to look this up and wasn't able to find any clear answer.

(It looks like at least some PDF readers have provided support for automatically displaying external images, for example)

The full PDF spec is insane and allows for web requests and javascript. Most readers do not implement the anti features but adobe's tools will.
You are fingerprinted when you find the web link.
When I click a link you mean? Definitely true, but that way they only have access to my IP and user agent, which is still better than all the WebGL, Font library, display calibration settings, mouse movement etc. that they use otherwise.

I often use Tor, although I'm pretty sure that even then, a good analytics lib can see it's me based on scroll behaviour, mouse movement, time of day, and of course what I browse.

But yeah, you make a good point.

Where do you get the link?
DDG mostly, and they don't track users.
Your device, your device version, screen size, browser, browser version, IP address, etc... Are all tracked regardless.

You might not be a unique fingerprint, but at best you are part of a group of somewhere between 3 and 1000 similar users.

Not to be a downer, but when I webscraped I learned that big corporations can spend money to fingerprint you.

Why?
You can not use js on your website.
> I certainly get the argument, but using something like hugo or gatsby or jekyll […]

Or a plug-in to Wordpress so you can keep the GUI/dynamic for the less technical employees:

* https://wordpress.org/plugins/simply-static/