I wonder if adopting io_uring on Linux might allow a browser to preserve the privacy a little, in this specific case. (Though it is very hard to get right, unfortunately.)
My suspicion is that io_uring itself mitigates syscall overhead but doesn't do anything to change interrupts.
You could probably do things at the OS level to change interrupt behavior in a way that would mitigate this attack significantly, I'll need to read the paper to see if they discuss this.
Correct. Probably the only way to mitigate interrupt stuff today is what they mentioned - you inject noise into the system intentionally with their example being to make network requests to local addresses. Fundamentally though the challenge is that if you start doing that, you probably start degrading performance fairly quickly for your neighbors. It’s really hard to balance mitigations that retain good performance. A more comprehensive solution probably involves a redesign of how we build CPUs and operating systems rather than trying to keep fighting this in software.
Interrupt noise can be eliminated by eliminating the interrupts themselves using user-space drivers like SPDK and DPDK for storage and networking, but (a) that would require a massive change in application architecture, and (b) it wouldn't help non-movable interrupts like softirqs or IPIs for rescheduling and TLB shootdown.
Softirqs aren't really interrupts, and they're totally under kernel control, so it might be possible to spread them out across cores or otherwise reduce their signal.
Eliminating noise from IPIs for rescheduling and TLB shootdown might require crazy architectural changes to the CPU - for instance an architecturally isolated fast timer which is basically a separate CPU, polls a queue of TLB shootdown requests and a wakeup request flag, and can exit without waking the CPU from a halt.
Fuzzing the timer seems like a hack - it doesn't eliminate the information leakage, but just makes it harder to measure. You can eliminate the signal by only reporting the amount of time that passes in user mode, but that results in a clock that can be arbitrarily slower than wall clock time. I suppose you could add a correction factor that's heavily filtered, so the final timer is never off by more than a constant amount, but this would have to be implemented as a new OS timer type with instrumentation in every interrupt handler, and then Javascript would have to be updated to use that new timer.
I come from embedded audio programming, where e.g. the variable loads of UI code can be problematic (=audible) for audio quality if you don't do things right.
Maybe we need to do things the other way around? So instead of trying to mask everything we are doing, we run browsers/tabs in a processing environment where the noise can't be measured because it does not occur during the same time window. In audio that is done by using a high priority fixed timer that interrupts the rest of the processing.
My OS knowledge is too marginal to know whether that would be truly feasible, but I can't help to think: yeah it is possible to fix that on a more fundamental level.
As far as browsers are concerned the actual solution is banning Javascript from regular Web. JS is basically remote code execution (even more so since JIT became the norm); it is a terrible idea that will continue to create all sorts of problems.
You could probably do things at the OS level to change interrupt behavior in a way that would mitigate this attack significantly, I'll need to read the paper to see if they discuss this.