Hacker News new | ask | show | jobs
Subresource Integrity (githubengineering.com)
184 points by mastahyeti 3924 days ago
18 comments

What about caching? If the HTML and the JS are both updated, but the browser receives the new version of one and the old version of another, this will break your page. (Since you'd now have to update the integrity attribute for every JS change, it means you run this risk every time you update your JS.)

To be fair, running a mismatched version of the JS could already break things if the changes are big enough, but for minor updates, the user often won't notice the difference. Now, these cases are hard failures. That's not necessarily a bad thing, but I wonder if there's a path here to tell the browser "you have an old version of the content; go get the new version."

CDNs and invalidations can be tricky, and it sounds like this could lead to things being broken more often if you're caught in the window where one piece updates before the other.

This isn't a concern with our implementation because a hash of the asset bundle is also included in the URL. This is a pretty common cache-busting technique for static assets and lets you send more aggressive cache directives to the browser.
D'oh; that makes sense.

Maybe I should refrain from posting my gut reactions (or at least wait until I'm awake first). =)

No, it was a very good point. Not everyone adds hashes to filenames, and to me it seems that you're right in that weird caching can break pages that way.

If indeed this is the case, subresource integrity needs a big warning sign about that. For me, your comment was that warning sign, so please keep posting while you're not awake yet.

Why would it need a warning? If the HTML provides a new integrity="" hash, then any cached version obviously wouldn't pass. Subresource integrity makes it easier to determine if a cached file has expired. The file can be permanently cached for any HTML that requests the same hash value(s).
The browser could do something with this, but I believe it doesn't. Instead the algorithm is just:

1) Load the resource specified in src (from network or cache)

2) If there's an integrity attribute, verify its hash

SRI allows one to specify multiple hashes. In other words, to prevent this particular mismatch, one could include the hash of the new resource as well as the previous valid hash.
> What about caching? If the HTML and the JS are both updated, but the browser receives the new version of one and the old version of another, this will break your page.

Only if your page requires JavaScript to function and doesn't gracefully degrade. None of us would ever write that sort of page, would we?

It would break anyway because pages are usually designed to degrade when JavaScript is disabled, not when the JavaScript fails to load or behaves in an unexpected way.

For example the <noscript> tag works that way.

None of us would ever indulge in soapbox politics, would we?
You must give each new Javascript version a different filename (by including either the hash or a version number) and keep old Javascript version available forever or at least for a large enough timespan.
Would love for the next generation of SRI to include signatures as an option (e.g. integrity="ed25519-<public_key>").

Hashes means you have to specify an exact version, so there's not an easy way to add integrity to things like Google's CDN for jQuery that has latest minor version update links for the major API versions of jQuery.

Of course, that means also adding a signature to the payload response (maybe an "Integrity: <hash>-<sig>" header?). So it's understandable why signatures weren't in scope for the first release.

Signatures are taken care of by connecting via TLS.

If a hypothetical attack breaks TLS or you don't use it, you can just change the public key served.

This is to prevent files on a 3rd party CDN from being loaded if they've been replaced with malicious ones.
Ah, I see. I misunderstood. Though signatures seem to be just adding another part in the deployment process where you update the files themselves as well as the pages they're loaded from.

Is there any security gain from doing that?

If you include content produced by a third party (e.g. JQuery) off a CDN, right now you can use the hash-based SRI mechanism to make sure that only the exact file you specified can be included, otherwise the CDN could suddenly send any compromised code. The file can't be changed, because otherwise the hash wouldn't match.

With a signature, you could specify "include cdn.com/jquery-X if signed by the JQuery project", so JQuery could publish security updates and those could be rolled out to the CDNs and included in all pages automatically, without the siteowners having to make changes (if the security fix doesn't break compatibility).

For your own content, you'd mostly gain the convenience of not having to update the hashes on all the pages including the resource.

This is more convenient but less secure than a straight-up hash. If an attacker compromises the JQuery signing key, they could still serve malicious files. With a hash, the authenticity is ONLY dependant on the TLS connection to the main website, e.g. github.

TL;DR:

* hash: need to compromise the main website, that supplies (and authenticates) the hash

* signature by CDN: attacker can either compromise the main website OR <del>the third party CDN</del> <ins>author/signer of the third-party resource</ins>

(edit: correction as pointed out by response)

A content hash is good enough. The trusted hash is sent over an already trusted channel (TLS).

Thanks for the downmods.

The point is that with a signature you wouldn't have to change all the pages including the resource, but just sign the updated resource with the same key.
It's just a semantic question. Does a URL point to a specific version of a resource or does it point to whatever the server considers to be the resource at a given time.

It would seem more desirable to be able to point to a specific version, instead of allowing a third party to be able to insert implicitly trusted code without acknowledgement.

This is nice and all, but as a security-paranoid I really wish Github would spent some effort improving their access control model. Today, Github access control is extremely course-grained, such that if I want to give someone permission to merely set labels on issues, I also have to give them permission to push arbitrary changes to the master branch. Additionally, the access control model is weird: I can define "teams" with some set of members and some set of repositories they can access, but the entire "team" must have the same access level to all repositories they can access, making it hard to define some repositories as being more sensitive than others. (Or, possibly, I've misunderstood the model, but if so that's its own problem.)

This matters: If someone wants to hack my company, they're not going to do it by hacking Github's CDN. They're going to do it by targeting particular employees -- probably focusing on those who have the least security experience. To reduce risk, I need to give each team member the least authority they need to do their job. Github is making it really hard for me to do that; I tend to have to give "admin" rights to everyone. :(

I really wish browsers could leverage this for caching across origins. If my copy of jQuery has the same SHA256 as another file the user has already downloaded, there's no need to load it again
There's subtle, dangerous ways this can be exploited. (Short version: It'd make SRI usable as an oracle to confirm or deny guesses for the content of a cross-domain resource.)
Couldn't this mitigated by user-agents introducing random, Poisson distributed delays in all cached responses? The peak of the distribution could be made user configurable to make it further difficult to predict a user-agent.
How is that dangerous?
It leaks private user info -- a malicious server could include a JS file confirmed to be highly sensitive/top secret, and measure whether the client already has that cached. If so then the user is confirmed a sensitive target.
No, there's a worse attack possible: you can attempt to include a resource with sensitive contents with SRI, and use the SRI to make a "guess" at the hash of the contents. If your guess is incorrect, the resource will fail to load, and you can detect this error and make another guess.

Obviously, this technique will only work if the contents of that resource are constrained enough that it's possible to guess them with brute force. Depending on how SRI interacts with the browser cache, though, it may be possible to make guesses very quickly -- it is likely that the browser will only fire one HTTP request for the initial attempt, and will load the resource from cache for all subsequent attempts.

you could just add a new public=true option to counter this. I think you can even already check that with an iframe (or js head inject & timing) anyway, no need for CSP for that.
Or require crossorigin="anonymous", maybe in combination with Cache-Control: public.
So to protect against a single malicious server who might discover that we had previously loaded a cached resource, we shouldn't implement a cross-origin cache and have to make repeated requests, guaranteeing 3rd parties (the CDN) keep getting GET requests?

You're just trading one problem (someone learning I previously requested a file) for another (leaking referrers to a CDN).

Also, if you're loading "highly sensitive/top secret" data with a <link integrity="" href=""> or <script integrity="" src=""> tag, you have bigger problems.

I see, thanks!
I think another subtle exploit is you can potentially track if a user has visited a website. E.g., site1 uses SRI on their unique resource, site2 uploads the same resource and SRI on theirs. so now site2 knows if a user has been to site1.
ahahah and slowly bittorrent takes over http :D
We've been on our way for a while now with IPFS.

https://ipfs.io/

This is one of the best additions to the Web Platform as of late IMHO. Great if you run an operation with a lot of third party code coming in from sources that you don't control - even beyond the security concerns for just "keeping them honest" about the scripts they run on your page. I hope it gets adopted by all browser vendors soon.
This looks like a fantastic technology to protect against maliciously injected javascript. Great to see GitHub leading the charge here and taking their security seriously.
As mentioned in the article, they were victims of such an attack.

Frankly I'm relieved to see that browser vendors and leading tech firms are maintaining control of the situation and protecting users, even if driven by self-interest.

    Widespread adoption of Subresource Integrity could
    have largely prevented the Great Cannon attack
    earlier this year.
Sorry, it wouldn't have. From the CitizenLab report [1] on the Great Cannon attacks:

    In the attack on GitHub and GreatFire.org, the GC
    intercepted traffic sent to Baidu infrastructure
    servers that host commonly used analytics, social,
    or advertising scripts.  If the GC saw a request
    for certain Javascript files on one of these servers,
    it appeared to probabilistically take one of two
    actions: it either passed the request onto Baidu’s
    servers unmolested (roughly 98.25% of the time),
    or it dropped the request before it reached Baidu
    and instead sent a malicious script back to the
    requesting user (roughly 1.75% of the time).  In
    this case, the requesting user is an individual
    outside China browsing a website making use of a
    Baidu infrastructure server (e.g., a website with
    ads served by Baidu’s ad network).  The malicious
    script enlisted the requesting user as an unwitting
    participant in the DDoS attack against GreatFire.org
    and GitHub.
So the idea is someone runs a site with:

    <script src="http://baidu.com/ads.js">
When visitors request these scripts the request passes through the "Great Cannon" which 1.75% of the time serves a different script instead. That malicious script makes lots of requests to the victim sites, and they're overloaded.

To prevent this sort of attack with SRI you would need to change your page to look like:

    <script src="http://baidu.com/ads.js"
            integrity="hash of the real ads.js">
The problem is, Baidu isn't going to be willing to commit to always serving the same ads js: they need to be able to make upgrades.

SRI is useful in the case where the entity producing the html is referencing js that they've uploaded to a third party CDN or js where they choose what version to run, but not in the normal "include a snippet and we'll do stuff to your page" model.

(To block the Great Cannon there, what would have worked would be moving the js serving to HTTPS.)

[1] https://citizenlab.org/2015/04/chinas-great-cannon/

Couldn't the great chinese firewall just intercept Github.com's HTML page as well and change the subresource integrity hashes? I thought that the Great Chinese Firewall already has the ability to penetrate SSL connections via some means.
The "Great Cannon" attack that they talk about in the blog post wasn't caused by replacing JS in GitHub pages. It replaced a Baidu Analytics script, used across the Chinese internet on thousands of websites, with a malicious one intended to DDOS GitHub from people's home browsers when these websites were accessed outside of China.

The way that this fixes the issue is by ensuring that the file being loaded on those thousands of websites is the correct one, and not the malicious attack script that was injected by the Chinese government or other such actors, otherwise it's not run at all.

Could the Chinese government rewrite the HTML of all these thousands of websites to also change the hash? Theoretically yes, but practically it makes it much more difficult.

The Great Firewall would probably have copies of private keys issued by CNNIC, and there's a bunch of attacks to get private keys via heartbleed, and a bunch of Debian easily guessable private keys, but there's no general purpose 'penetrate SSL' attack that we know of right now.
Given control of a certificate authority can the Chinese government issue a new certificate for github.com? I assume they can enforce that computers sold in China have their authority in the default trust list, at which point I think all bets are off when it comes to SSL.
Yes, however if they can change the contents of the HTML they can probably modify CSP headers, which means they can just deliver whatever payload they want directly and wouldn't need to modify the integrity hashes.
They could (assuming that they can infiltrate SSL as you said). I think this is more oriented towards a different attack vector whereby the controller of a resource (JS, CSS, etc.) can alter that resource while the parent page remains unaffected.
Yes, though it involves actively processing every request for every page and processing it to replace (or just remove) integrity attributes from the HTML; that's a lot harder than just wholesale replacing the contents of specific JavaScript files on their way across the firewall.
<script src="..." is very dangerous. At best, you can vet the src and check to see if it's benign or not. Often times, that vendor and their "1-line of javascript to get our whiz-bang service" in turn loads other javascript files. I don't see how cryptographically signing the bootloader solves anything in this case. Compromised analytics or vendor javascript will still lead to total site pwnage if I'm reading this right.
This protects you from providers that go rogue or are compromised after you enable their JS.

It also lets you use CloudFront as a CDN for your own JS without having to trust them to serve the content as you described it, if you calculate your hashes based on the scripts you sent them.

The parent poster's point is about providers that tell you to include script A which then loads X and Y. Knowing A can't change isn't very helpful in this situation as X and Y could change.
Careful! I've seen proxies (TracFone I think) subtly modify JSON files by removing whitespace, probably in the name of download speed. That will break the hashing.

If you start seeing unexplained errors on pay-as-you-go phones, you'll know why; although if this facility gains popularity then I'm sure they'll be pressured to stop modifying content.

This is not possible if you are loading resources over HTTPS (unless the carrier has installed a root certificate on your device, in which case you're not in a great place security-wise anyway).
By the way. I have an SRI tester to determine if your browser supports SRI. It's still very new and doesn't have a lot of support

https://ejj.io/sri/

The next step: A distributed, content-addressed caching system that allows the web browser to fetch the data from the fastest/nearest caching server by hash.

IPFS comes to mind.

Edit : post below is right, nonces are only for inline scripts https://bugs.webkit.org/show_bug.cgi?id=89577

original: IIRC CSP already has hashes for resources, which also would handle this purpose.

As a side note, there's at least one CDN already hosting fake copy of bootstrap - I've seen a mlicious extension loading it in my report-uri.io logs.

afaik CSP hashes are only for inline resources, but I could be wrong on that.
You can use https://srihash.org to hash links and update your HTML.
This is great, but only if your CDN is not also serving your HTML files! (static sites)
For a static site I expect you would be far less concerned about session hijacking or XSS if someone took over that domain. Even a complete single-page app should serve the initial html request from a trusted domain/server.
Should be "sha256-..." (without dash between sha and 256)
Thanks. I updated the post and opened a PR to fix the README on sprockets-rails. https://github.com/rails/sprockets-rails/pull/273
This is an excellent idea. So long as you trust the server you're talking to, and it's using TLS, you can eliminate attack vectors by a compromised CDN this way.

Bravo. :)

It's nice that Github, Inc. likes subresource integrity. Did they put it on their web pages? As of right now, it doesn't seem to be on their home page. The next big step is for Wordpress to support it.

Subresource integrity is in some ways more important than "HTTPS Everywhere", because the MITM-as-a-service sites such as Cloudflare subvert HTTPS Everywhere. For security reasons, you might choose to serve your home page and a few security-critical pages from your own server, without using a CDN. But run everything else through the CDN, using subresource integrity to keep the CDN honest.

With subresource integrity, many items no longer need to be encrypted. This is good for security. Encryption interferes with caching, and HTTPS in front of caches means that the attack surface is larger, and includes the CDN.

(Yes, there's an argument that HTTPS conceals what the user was browsing. Not really. Checking document length will provide a good hint on what static asset was read. The pattern of document lengths requested tends to fingerprint the page being read.)

Login and inspect the home page afterwards. http://imgur.com/bwUHgcT
I'm logged in. Not seeing it. Maybe it's not deployed for all accounts yet.
It's only included for browsers that support it. That's Chrome>45 and Firefox>43.
Serving different content based on the user agent? Bad site. No donut.