Hacker News new | ask | show | jobs
by jackcook 1464 days ago
You can pull off attacks like this from JavaScript by repeatedly recording the time and training a machine learning model on traces of instruction throughput over time, which my group did in a recent paper: https://jackcook.github.io/bigger-fish/
1 comments

Could you elaborate on this attack? It’s an interesting read, but I’m curious about practicality.

How would you ensure that the user loads your malicious script, and has a running web worker for it?

I see that you trained it on 100 websites. Would you need to retrain for every new version deployed or different paths with varying content?

If your intention is to detect sensitive website accesses, wouldn’t you need those websites to be public to train the model first? I’m not convinced that detecting porn access is particularly malicious, but I acknowledge that it is illegal in some places.

You'd just need to put the script on any webpage the user might access and leave open, such as Google, or Facebook, or whatever. The attack isn't specific to JavaScript, so really you could put this in a desktop app too, think Slack, Spotify, etc. Any app or website that you know the target user is likely to open. CDNs are also a great target.

We evaluated on 100 websites as a proof of concept, but we also included experiments in an "open world" setup where the classifier has to predict whether the activity is from one of 100 sensitive websites, or whether it's none of them, and found that it's still very accurate in that more realistic setup. You would need to retrain to identify more websites outside of your set of 100.

The websites would need to be public, which is basically the same limitation as hertzbleed, since they need to know what they're looking for in order to identify an activity. Some use cases with this limitation aren't too hard to imagine: maybe you're in a country that bans access to major Western news sites but you're evading censorship with a VPN.

I’m a little confused about your attack vector - how feasible would you reckon it is to place such a malicious script on the largest public websites in existence, versus just getting the victim to install a Trojan? The latter could just literally monitor the user.

I’m not saying your paper is technically wrong, just practically infeasible.

Right now, you’ve chosen very specific websites. Have you explored if there is a correlation between specific scripts (react, jquery, etc) and whether websites with similar setups cannot be differentiated? I was also curious about content/non-homepage paths. Your conclusion seems to be that interrupts/etc are the primary indicators, so I suspect there’s a connection.

Edit:

In my experience, large websites and most web apps don’t use CDNJS/etc, but bundle their code - this would make injecting your script much harder without a supply chain attack.

On second thought, given CORS I think this attack is actually impossible. How would your embedded script communicate your findings with your server? You would need to control the originating domain itself…

I don't think any of these side channels are really easy to pull off without the technical capabilities of a nation state or something similar. I personally think embedding a malicious script in a CDN (e.g. https://blog.ryotak.me/post/cdnjs-remote-code-execution-en/) that serves a script for a large website, or something similar (https://blog.igorescobar.com/2016/08/21/ive-the-chance-to-tr...), is more realistic than getting the victim to install your program -- I would imagine sensitive individuals are very concerned about installing arbitrary software.

We did get a comment about this in our rebuttal but didn't end up including it in our final paper -- we found that we distinguished sites with the same frameworks (such as react, angular, and jquery) at the same accuracy at sites that used different frameworks.

We didn't do much research into content/non-homepage paths but it's a good area for future research. I would suspect it'll still do pretty well.

And yes, we concluded that the source came from interrupts (in Table 3 of our paper you can see we ran an experiment with frequency scaling turned off), which does make me question the practicality of hertzbleed. I wouldn't doubt it can be exploited somehow though.

In my experience, large websites and most web apps don’t use CDNJS/etc, but bundle their code - this would make injecting your script much harder without a supply chain attack.

On second thought, given CORS I think this attack is actually impossible. How would your embedded script communicate your findings with your server? You would need to control the originating domain itself…