Hacker News new | ask | show | jobs
by mikeash 5047 days ago
It seems to me that the best way to handle this would be a content-addressable system with a more traditional fallback. You'd declare that you want file with SHA-256 (or whatever) hash of XYZ. If the browser has it, then you're done. If the browser knows where to find XYZ, then it can go off and grab it however it feels like. For compatibility, you'd also specify one or more traditional URLs where you think that content XYZ can be found, and the browser could use the hash to verify integrity.

The trouble with the current CDN setup is that you only get the maximum benefit if everybody uses the same CDN, but people don't necessarily want to trust Google or whoever to host code that their site relies on. With a content-addressable system with fallbacks, you'd get all the benefits of the CDNs with none of the drawbacks.

1 comments

I think we already have this system - it's the browser's cache. Consider the following sequence of steps:

1) add the "SIG" values to the tags in the HTML page that have SRC attribute. Be it scripts, or images, or iframes, or what not. So far, we've just bloated the page a bit with no good effect for the user.

2) update the code in the browser to calculate for each resource in the browser's cache the values of a few "frequently used" signatures, and allow the signature-based access to the content [compressed trie?], in addition to indexing the cache by the URL. Now, the "bloat markup" from step 1 starts to kick in - and you can reuse web-wide all sorts of resources - scripts, downloadable fonts, artwork, whatever. At this point the user spends the time only on the first download of the file, even if that file comes from a very slow VPS.

This approach could dramatically decrease the load on the CDNs for the frequently-repeated content, and get the latency down to near zero, so it would be much better than even the ISP-hosted CDNs.

Maybe anyone is reading this who is familiar with the FF/Chromium codebase to comment on the feasibility of such an approach ?

I'm a bit confused, how does this qualify as "we already have this system" when your step #2 is "update the code in the browser"?

Aside from that, yes, this sounds pretty much like what I'm thinking of.

I thought you had in mind a totally new content delivery mechanism to fetch the data by hash from the network, relative to that adding the content addressability to browser cache is near trivial. Apologies if I misunderstood.
I was thinking that this could be added as well, but that it would be purely optional, if anyone got around to adding it.

Basically, you need the hash and a fallback regular URL. The browser is free to grab the content using the hash however it likes, whether it's grabbing it from its cache, using the fallback URL, using another known URL for that content, or using some new delivery mechanism.