Hacker News new | ask | show | jobs
by Nux 5052 days ago
Ok, I'm not a developer and I'm probably missing the big picture or smth; my question: - All this fuss is about hosting a few text files none bigger than several KB??

Who in this world does NOT afford to host a few small files nowadays?

3 comments

Performance is more dominated by latency than file size, and folks usually care about this more for page load time of a page than bandwidth cost. CDNs are usually better than hosting on your server directly, because they have better latency properties (many CDN endpoints which are more likely to be "closer" to the end user).

The advantage of sharing a CDN (as opposed to every site having it's own) is caching. If I visit website-a.com and they include jquery.js from cdnjs.com then I visit website-b.com that also uses the same file, I don't have to download it twice and website-b.com loads faster than if it served that file itself. That's the big win, IMO, but unfortunately depends on cdnjs having a high density of use. Even still, if you can shave a 100ms of the loading of your page that can matter.

The idea is that users wouldn't have to download the same jquery or whatever script over and over for every site they visit. It makes less sense when it comes to not-so-popular script files, but I guess there's a convenience factor too.
We need to extend the baseline notion of what the web is. If some nontrivial number of sites are using (say) jQuery, then it would be a good idea to have a way to declare "SCRIPT SRC jQuery version x.y.z" and let the browser figure out where it lives. Then you fetch it once, parse it once, and run it many times, no matter what site you may be visiting.

Or at the very least, we need some way to say "get this script from this URL, but only if it hashes to <this value>, since otherwise it's been compromised". Why worry about CDNs when you can design the script-switcheroo attack right out the system in the first place?

I wrote about this in June: http://rachelbythebay.com/w/2012/06/27/src/

It seems to me that the best way to handle this would be a content-addressable system with a more traditional fallback. You'd declare that you want file with SHA-256 (or whatever) hash of XYZ. If the browser has it, then you're done. If the browser knows where to find XYZ, then it can go off and grab it however it feels like. For compatibility, you'd also specify one or more traditional URLs where you think that content XYZ can be found, and the browser could use the hash to verify integrity.

The trouble with the current CDN setup is that you only get the maximum benefit if everybody uses the same CDN, but people don't necessarily want to trust Google or whoever to host code that their site relies on. With a content-addressable system with fallbacks, you'd get all the benefits of the CDNs with none of the drawbacks.

I think we already have this system - it's the browser's cache. Consider the following sequence of steps:

1) add the "SIG" values to the tags in the HTML page that have SRC attribute. Be it scripts, or images, or iframes, or what not. So far, we've just bloated the page a bit with no good effect for the user.

2) update the code in the browser to calculate for each resource in the browser's cache the values of a few "frequently used" signatures, and allow the signature-based access to the content [compressed trie?], in addition to indexing the cache by the URL. Now, the "bloat markup" from step 1 starts to kick in - and you can reuse web-wide all sorts of resources - scripts, downloadable fonts, artwork, whatever. At this point the user spends the time only on the first download of the file, even if that file comes from a very slow VPS.

This approach could dramatically decrease the load on the CDNs for the frequently-repeated content, and get the latency down to near zero, so it would be much better than even the ISP-hosted CDNs.

Maybe anyone is reading this who is familiar with the FF/Chromium codebase to comment on the feasibility of such an approach ?

I'm a bit confused, how does this qualify as "we already have this system" when your step #2 is "update the code in the browser"?

Aside from that, yes, this sounds pretty much like what I'm thinking of.

I thought you had in mind a totally new content delivery mechanism to fetch the data by hash from the network, relative to that adding the content addressability to browser cache is near trivial. Apologies if I misunderstood.
Ok. So I didn't miss anything obvious. :-) This might have been handy back in the dial-up days, but not now. Now I see it as a security and stability risk.
CDNs get the content cached closer to the end user than you are likely to be yourself.

Round trip time is especially important when you're loading several assets in a single page.

CDNs also often provide higher availability than you can provide yourself

Using someone else's CDN is considerably cheaper than serving resources yourself. e.g. Using jquery from google cdn. Bandwidth costs for "small" javascript files start to add up once you have 100 million people loading it each day.

It's more about client performance through using shared cache assets than cost. Though I always worry about someone at a supposedly secure system (online banking for example) thinking its a bright idea as well.