Hacker News new | ask | show | jobs
by float4 1803 days ago
The only thing I like about PDF compared to HTML is that with PDF, I know for a fact that no web requests are made in the background. That means no fingerprinting, no analytics etc.

With HTML, I have to trust that some random entity does what they state in their privacy policy, and they regularly don't. Sure, I can disable JS, but then 95% of the web doesn't work anymore.

Other than that PDF is quite clearly a less accessible format.

6 comments

How do you know for a fact? PDF has JS in the spec, and it supports SOAP and Web Services. Have a look at https://www.adobe.com/go/acrobatsdk_jsdevguide
That's not the PDF spec is it? That is a spec for Adobe Acrobat, which is not allowed to make any web requests thanks to my application firewall (Little Snitch).

Pretty sure a PDF opened in the browser can't run any JS, but not completely sure. So you're right: I don't really know it for a fact. Poor choice of words.

The spec is ISO 32000, and it’s expensive and closed, so difficult to reference. But according to Wikipedia at least, JavaScript is normative in it. No idea if SOAP / Web Services is part of it though.
The spec for PDF 1.7 is here: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PD...

JavaScript is allowed, but not in PDF/A, which is what I use.

The PDF 2.0 spec is damnably not public.

But you can't easily tell PDF/A and regular PDF apart, so we're back to the same situation as HTML vs. HTML with javascript turned off.
Are you sure? I was under the impression that PDFs can reference web resources, and this is why there are more stringent standards for archiving (PDF/A and friends)
> With HTML, I have to trust that some random entity does what they state in their privacy policy, and they regularly don't. Sure, I can disable JS, but then 95% of the web doesn't work anymore.

If you only allow PDF, then 99.9999% of the web doesn't work anymore.

I'm all for getting sites to be static, but PDF doesn't fix that because the problem has never been the technology used to build the site.

How sure are you that there are no network requests happening? I tried to look this up and wasn't able to find any clear answer.

(It looks like at least some PDF readers have provided support for automatically displaying external images, for example)

The full PDF spec is insane and allows for web requests and javascript. Most readers do not implement the anti features but adobe's tools will.
You are fingerprinted when you find the web link.
When I click a link you mean? Definitely true, but that way they only have access to my IP and user agent, which is still better than all the WebGL, Font library, display calibration settings, mouse movement etc. that they use otherwise.

I often use Tor, although I'm pretty sure that even then, a good analytics lib can see it's me based on scroll behaviour, mouse movement, time of day, and of course what I browse.

But yeah, you make a good point.

Where do you get the link?
DDG mostly, and they don't track users.
Your device, your device version, screen size, browser, browser version, IP address, etc... Are all tracked regardless.

You might not be a unique fingerprint, but at best you are part of a group of somewhere between 3 and 1000 similar users.

Not to be a downer, but when I webscraped I learned that big corporations can spend money to fingerprint you.

Why?
You can not use js on your website.