Hacker News new | ask | show | jobs
by dasony 5045 days ago
The idea is that users wouldn't have to download the same jquery or whatever script over and over for every site they visit. It makes less sense when it comes to not-so-popular script files, but I guess there's a convenience factor too.
2 comments

We need to extend the baseline notion of what the web is. If some nontrivial number of sites are using (say) jQuery, then it would be a good idea to have a way to declare "SCRIPT SRC jQuery version x.y.z" and let the browser figure out where it lives. Then you fetch it once, parse it once, and run it many times, no matter what site you may be visiting.

Or at the very least, we need some way to say "get this script from this URL, but only if it hashes to <this value>, since otherwise it's been compromised". Why worry about CDNs when you can design the script-switcheroo attack right out the system in the first place?

I wrote about this in June: http://rachelbythebay.com/w/2012/06/27/src/

It seems to me that the best way to handle this would be a content-addressable system with a more traditional fallback. You'd declare that you want file with SHA-256 (or whatever) hash of XYZ. If the browser has it, then you're done. If the browser knows where to find XYZ, then it can go off and grab it however it feels like. For compatibility, you'd also specify one or more traditional URLs where you think that content XYZ can be found, and the browser could use the hash to verify integrity.

The trouble with the current CDN setup is that you only get the maximum benefit if everybody uses the same CDN, but people don't necessarily want to trust Google or whoever to host code that their site relies on. With a content-addressable system with fallbacks, you'd get all the benefits of the CDNs with none of the drawbacks.

I think we already have this system - it's the browser's cache. Consider the following sequence of steps:

1) add the "SIG" values to the tags in the HTML page that have SRC attribute. Be it scripts, or images, or iframes, or what not. So far, we've just bloated the page a bit with no good effect for the user.

2) update the code in the browser to calculate for each resource in the browser's cache the values of a few "frequently used" signatures, and allow the signature-based access to the content [compressed trie?], in addition to indexing the cache by the URL. Now, the "bloat markup" from step 1 starts to kick in - and you can reuse web-wide all sorts of resources - scripts, downloadable fonts, artwork, whatever. At this point the user spends the time only on the first download of the file, even if that file comes from a very slow VPS.

This approach could dramatically decrease the load on the CDNs for the frequently-repeated content, and get the latency down to near zero, so it would be much better than even the ISP-hosted CDNs.

Maybe anyone is reading this who is familiar with the FF/Chromium codebase to comment on the feasibility of such an approach ?

I'm a bit confused, how does this qualify as "we already have this system" when your step #2 is "update the code in the browser"?

Aside from that, yes, this sounds pretty much like what I'm thinking of.

I thought you had in mind a totally new content delivery mechanism to fetch the data by hash from the network, relative to that adding the content addressability to browser cache is near trivial. Apologies if I misunderstood.
I was thinking that this could be added as well, but that it would be purely optional, if anyone got around to adding it.

Basically, you need the hash and a fallback regular URL. The browser is free to grab the content using the hash however it likes, whether it's grabbing it from its cache, using the fallback URL, using another known URL for that content, or using some new delivery mechanism.

Ok. So I didn't miss anything obvious. :-) This might have been handy back in the dial-up days, but not now. Now I see it as a security and stability risk.
CDNs get the content cached closer to the end user than you are likely to be yourself.

Round trip time is especially important when you're loading several assets in a single page.

CDNs also often provide higher availability than you can provide yourself

Using someone else's CDN is considerably cheaper than serving resources yourself. e.g. Using jquery from google cdn. Bandwidth costs for "small" javascript files start to add up once you have 100 million people loading it each day.