Hacker News new | ask | show | jobs
by jrochkind1 2066 days ago
Can anyone find any data on how often cache hits happened for shared resources from CDNs anyway? How useful was this actually? I'm not confident it was a non-trivial portion of bytes fetched by a typical session. But maybe it was. Has anyone found a way to measure in any way?
7 comments

Pretty poor. Even with a widely used library like jquery, version skew meant that there was pretty limited overlap. I collected notes on the issue some time ago: https://justinblank.com/notebooks/browsercacheeffectiveness.....
Agreed. I've stopped relying on CDN caching for my projects and instead try to focus on avoiding large js payloads entirely.
For the last half decade, almost all apps have been deployed with webpack/browserify/other bundles. All the assets get smashed together into a big custom bundle that doesn't get cached across sites.

This has been really sad & a big loss for the web, in my opinion. And it's one that we were about to emerge from[1], it seems like.

Alas, if we do go back to a more old-school CDN-based style of web scripting/javascripting, powered by our new ES Modules (& hopefully Import Maps) this new sharding-by-origin change will mean that we will never ever see the CDN hit-rate benefits we once saw.

It seems like it is a necessary change, to protect the user from being tracked, but it still hurts my heart so much, that we are so near to getting back to sharing resources on the web, only to have all that sharing snatched away. Whatever metrics you are looking at today, know that they represent a very sad state of affairs, that brought great pain & suffering to the hearts of many webdevs who aspired for much much much higher hit rates.

[1] https://www.bryanbraun.com/2020/10/23/es-modules-in-producti...

I think the linked resources show that it probably never worked the way we hoped it would. The first investigation I linked was from 2011, prior to the existence of webpack.
A lot of people have high-hopes that import-maps[1] will allow us to consume a variety of ES Modules from a variety of CDNs effectively. It gives us back the "bare specifies" that CommonJS introduced, where you say `import $ from "jquery/index.js"` in the code you write. Then the import-map helps the browser understand which CDN or otherwise to reach out to get that index.js file. We think this will allow ES Modules to be broadly usable & "modular", in a way that they have not been. I & a bunch of others are holding our breath on this one. It feels like it really can fix this huge hang up, giving us a way to author modules in a way that allows modular consumption.

[1] https://wicg.github.io/import-maps/

Given that cache hits only work with a specific URL the results are in practice anything between pointless to only slightly good (with maybe one or two exceptions).

I mean to have a cache hit you need:

- Same CDN

- Same library

- Same library uploader/name

- Exact Same library version down to every byte of js

- Exact same way to refer to given version (e.g. if latest is 1.3.2 then foobar-1.3.2 and foobar-latest are not the same, except if foobar-lastest is a temporary redirect to foobar-1.3.2. But that would induce a further round trip).

If we furthermore consider that most people most times visit a small number of domains it's not to hard to reason that the value gained from caching doesn't outweigh the cost for the majority of users.

My sense is that it was always pretty low. What are the odds two sites use the same exact version of jquery and same third party cdn while the cache is still warm?
I guess it really depends on the use case and the library.

I can imagine a tonne of WP sites referencing jQuery from the Google CDN, e.g.: https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.mi...

Looking at the headers the JS asset would be cached for 1 year.

There are always a few exceptions.

But you also must consider that most people most times visit a small set of domains.

Which means that most times they will have jquery and similar cached even without cross domain caching.

I guess the CDNs assume people should use libraries without version pinning, ie "latest".
I feel that for a long time this has greatly lost it's usefulness. In a time when more websites are "webapps" built using webpack and other similar tools, we've seen a big decline in the use of CDNs.
Yeah -- for all that people are worried about efficiency gains, I'm kind of doubtful that most end-users, even on slow connections, will even notice that caches are restricted to within domains.

I suspect that website who are conscious of loading times are already testing performance with nothing cached. And websites that aren't conscious of loading times are probably using bundling techniques that would already make cross-site caches useless. In both cases, I'm having a hard time believing that loading JQuery is the reason anyone's website is slow.

There are theoretical schemes that could allow us to share libraries between sites without having the same privacy impacts, but I'm not sure it's even worth the effort of proposing them.

I'm not sure how far this is technical possible but for people which are on so slow/low bandwidth connections that they have a noticeable drawback because of this change I believe there is a better solution:

An extension keeping widely used versions of libraries preloaded as well as a small db of CDN/urls so that it can serve the pre-loaded libraries instead of the CDN ones when possible. This also could do thinks like collapse foobar-latest and foobar-X.Y.Z (X.Y.Z == latest) and could force load a different version with security patches. I.e. it would act kinda like a linux package manager for a limited part of common libraries.

Decentraleyes does exactly this.
Check out LocalCDN for a fork with actively-updated CDNs.
I use decentraleyes and it tells me it replaced network version with local versions 395 times since installation (probably a year ago). That's not very muhc.

It doesn't get hits like it should. It would be nice to be able to add libraries to the cache manually because many are out of date.

Probably just like shared libraries. Ostensibly multiple applications will share the same DLL.. In practice their versions are incompatible, if software uses the same dependency to begin with.