Hacker News new | ask | show | jobs
by realusername 4301 days ago
It's partially related but something I would really like to have is a cross-website cache for public scripts. Around 80% of the size of the scripts is public libraries used by almost everyone (jquery, bootstrap, moment.js, various jquery plugins, angular...) and each of them is downloaded thousands of times.

One simple solution could be something like this:

<script type="text/javascript" src="/js/jquery.min.js" public="sha1:356a192b7913b04c54574d18c28d46e6395428ab">

This way the browser can have a look at the hash and not query the file at all. This could not lead to security issues since the hash saved by the browser is not the hash displayed but the one computed with the actual file. (and obviously you are only using the public attribute for scripts which are meant to be public).

With this technique, the most popular libraries could be cached and not downloaded by users.

6 comments

Subresource Integrity (SRI) addresses the problem indirectly but it allows you to add checksums to resources. There's various security considerations in regards to caching, I don't think that the doc touches them all: http://www.w3.org/TR/SRI/#caching-optional-1

Anyhow, that might be a good place to contribute.

In addition to the (significant) bandwidth savings, this is an important idea for privacy/tracking reasons as well. I may be fine with websites A, B, and C logging that I made a request for one of their pages, but I'd rather not give Google[1] the browsing path A->B->C just because they host jQuery.

While browsers having an internal copy of various common scripts is a great idea, I was briefly working on a Firefox addon that would simply hard-caches any URL that matched some sort of criteria (e.g. regexp for "//ajax.googleapis.com/ajax/libs/.*\.js")

Unfortunately, the project is on hold for now. While it it was easy to match HTTP requests with an observer for 'http-on-modify-request', the nsIHttpChannel[2] object you get from that only seems to let you redirect the request. I considered trying to redirect to a "chrome:" or "file:" url, but that seem like a horrible solution. The real way to mess with HTTP loading and caching, unfortunately, is buried somewhere I have yet to find. :/

[1] or any other shared CDN, such as CloudFlare and their horrible hashed domain names

[2] https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/...

That would be really neat. It could also solve a security issue with public CDNs.

Right now there's nothing to stop a malicious CDN from changing the content of an included script on your site without you knowing it.

With a hash tag like this the browser could refuse to load the file or warn the user if it didn't match.

You could have a small JS snippet on the page (served from your own domain) that checks the hash of the JS loaded from a CDN before running it.
That would work well for most, until the /jquery-latest.min.js or whatever is updated to the newest, latest release. But that would also be a problem with the browser based solution.

The question then is - how do you distribute the trusted hash?

Maybe there should be an independent organization or website that serves trusted hashes for common or registered libraries and files.

Right, you can't verify hashes for resources that change. You'd have to link to a specific version that everyone can agree on. As for trusting the hash itself - I guess someone you trust (probably the author) would have to sign the hash, then you could verify the signature.
As long as the author isn't serving the signed hash via the same CDN as the files. Then there's the logistics problem of having to looking in different hash locations for each file.

I'm just thinking of some libraries that could be security sensitive, and thus using latest releases on day 1 is the most important. I surmise these would also be the same libraries you would want to use this type of authentication on.

If an attacker changes the signed copy on the CDN, the signature check will fail.
Maybe browsers should ship with these libraries so nobody's relying on every single random website to be impenetrable.
Then you end up with the question of "how do you decide what libraries to include"
Exactly.

Then you create an entirely new, fragmented ecosystem like the current html and css web standards, adding more complexity and layers to front-end web development.

Best that the browsers stay agnostic in that regard.

http://trends.builtwith.com/javascript

All the browser companies are in a particularly good position to collect this information too.

Or they could all link to the same copy, like Google's hosted libraries. https://developers.google.com/speed/libraries/devguide#Libra...
Did you read the post? ;)

It needs to be built into the browser because of issues like the one he was having.

This solves the problem on the server side, using existing standards, without building any new tech into the browser.
The problem is that the current "solution" is to cache based on the URL, which breaks if the URL is not accessible, as in this instance.

The suggestion solves that issue by using hashes of the files, so it doesn't matter if they are loaded from a remote/CDN URL or from the same server, they will be considered cached by the browser (and loaded from cache) regardless once the hash matches.

Did you read the article?

> It turns out that many websites are loading content from Google’s CDN, or Facebook/Twitter APIs, which are blocked in China.

I'm not commenting on the article, I'm replying to realusername's "partially related" idea.
It doesn't solve the problem because of the OP's issues.

Using a hash would allow you to load them from any URL, including the blocked ones.

No, it doesn't.
Realusername would like "a cross-website cache for public scripts". That's what this does. Every site gets to load the version from your browser's cache without downloading it. The problem given is that "each of them is downloaded thousands of times", and this fixes that.
But you're presuming that the shared URL is available to the browser. The whole point of this story is that that presumption is absolutely false for internet users in China. I'm betting that you'd find the same to be true for users in Iran, North Korea and any other embargoed nation. Realusername's solution was an attempt to solve the problem for everyone without writing off the billion or so users unfortunate enough to live in repressive countries.

But, you know, you live in the US and your solution works for everyone in the US, so F everyone who doesn't.

Great - use the hash of an obscure site specific script, then detect how quickly the script loads and you know whether your victim has visited the site because they have it in their cache. Looks like a surefire route to a cache information leak to me.
You can already do that.

     good.com:
        <script src="/js/site.js">

     evil.com:
        <img src="https://www.good.com/js/site.js">
Then use the navigation timing api to figure out whether the js was already in cache.
(Actually, you could use the onload event; you don't actually need navigation timings.)
Cache information leakage for common js libraries is a non-issue compared to a CDN being compromised and mass MITMing via javascript libraries.

Even things like NoScript don't stop that vector if you whitelist common CDN's like google's.

The main goal of a technology like this is to help the caching of common scripts, not to cache your entire website.

The only information you could have with this is that the browser already downloaded jquery from another website, that is not going to help that much.

Just make sure that you use a properly secure hash...