Hacker News new | ask | show | jobs
by saagarjha 837 days ago
Even putting aside CORS because I don’t even want to think about how this plays well with requests to another (tracking?) domain, this still doesn’t seem worth it. The explicit use case seems to be that it basically tells the server when you last visited the site based on which dictionary you have and then it gives you the moral equivalent of a delta update. Except, most browsers are working hard to expire data of this kind for privacy reasons. What’s the lifetime of these dictionaries going to be? I can see it being ok if it’s like 1 day but if this outlives how long cookies are stored it’s a significant privacy problem. The user visits the site again and essentially a cookie gets sent to the server? The page says “don’t put user-specific data in the request” but like nobody is stopping a website from doing this.
5 comments

I think fingerprinting using this is mostly like the more direct ways to fingerprint with the cache, and the defenses against one are the defenses against the other.

For the cross-site thing, cache partitioning is the defense. If the cache of facebook.com/file is independent for a.com and b.com, Facebook can't link the visits.

An attacker using the hash of a cached resource as a pseudo-cookie could previously use the content of the resource as the pseudo-cookie. The Use-As-Dictionary wildcard allows cleverer implementations, but it seems like you can fingerprint for the same time period/in the same circumstances as before. In both cases you might do your tracking by ignoring how you're supposed to be using the feature; as you note, no one's stopping you.

Before and after the compression feature, it is true anti-tracking laws, etc. should address tracking with persistent storage in general not only cookies, much as they need to handle localStorage or other hiding places for data. Also true that for a browser to robustly defend against linking two visits to the same domain (or limit the possibility of tracking to a certain time period, session, origin, etc.), caching is one of the things it has to limit.

I think if they get the expiry, partitioning, etc. right (or wrong) for stopping cache fingerprinting, they also get it right (or wrong) for this.

I was admittedly a fan of the original SDCH that didn't take off, figuring that inter-resource redundancy is a thing. It's a neat spin on it to use the compression algo history windows instead of purpose-built diff tools, and use the existing cache of instead of a dictionary store to the side. Seems easier to implement on both ends compared to the previous try. I could see this being helpful for quickly kicking off page load, maybe especially for non-SPAs and imperfectly optimized sites that repeat a not-tiny header across loads.

I think I’d feel better with a fixed set of dictionaries based on a corpus that gets updated every year to match new patterns of traffic and specifications. Even if it’s less efficient.
Ya. Where is accept-encoding: zstandard-d-es2024

Where it encodes js files with a known dictionary that is ideal for es2024

And here’s one tuned for react, and one for svelte…
That wouldn’t make sense as it would be the user agent (aka your browser) that implements these shared dictionaries and they wouldn’t be able to add non-standard shared dictionaries for libs like react.

If they could do that then they might as well preload the cache with all common libs like react from well known cdn urls.

Committee decided set of dictionaries.

I never cared for react, but I know beyond a doubt that someone influential will ask for a dictionary tuned for it.

Currently the max is temporarily capped at 30 days otherwise it would work as long as the dictionary is in the cache.

https://source.chromium.org/chromium/chromium/src/+/main:ser...

> Dictionary entries (or at least the metadata) should be cleared any time cookies are cleared.

So it seems it should not get you anything you cannot already do with cookies.

https://github.com/WICG/compression-dictionary-transport/blo...

It's interesting this is mentioned specifically about the metadata used by this feature: fingerprinting using this feature has similarities with other cache fingerprinting (wrote a sibling comment about that).

It's not actively bad to have defense-in-depth measures at the level of the dictionary feature. But if your implementation of dictionaries using your browser's existing cache policies is a privacy problem, I'd consider changing the cache, not just the shared-dictionary implementation.

The dictionaries are partitioned by document and origin so a "tracking" domain will only be able to correlate requests within a given document origin and not across sites.

They are also cleared any time cookies are cleared and don't outlive what you can do today with cookies or Etags (and are using the most restrictive partitioning for that reason).