Hacker News new | ask | show | jobs
by jerf 3092 days ago
I can imagine some ways to armor the branch predictor, similar in principle to how languages like Perl have to include a random seed in their hash code (in some circumstances) to avoid being able to pre-compute values that will all hash to the same thing [1]. There should be some ways to relatively cheaply periodically inject such a randomization into the prediction system enough to prevent that aspect of the attack. This will cost something but probably not be noticeable to even the most performance-sensitive consumers.

But no solution leaps to mind for the problem of preventing speculative code from leaking things via cache, short of entirely preventing speculating code from being able to load things into the cache. If nobody can come up with a solution for that, that's going to cost us something to close that side channel. Not sure what though, without a really thorough profiling run.

And I'd put my metaphorical $5 down on someone finding another side channel from the speculative code; interactions with changing processor flags in a speculative execution, or interaction with some forgotten feature [2] where the speculation ends up incorrectly undone or something.

Yuck.

[1]: https://blog.booking.com/hardening-perls-hash-function.html

[2]: https://www.youtube.com/watch?v=lR0nh-TdpVg - The Memory Sinkhole - Unleashing An X86 Design Flaw Allowing Universal Privilege Escalation (Dec 29, 2015)

2 comments

I'm thinking you might be right.

It's going to be really hard to give up real world gains from branch prediction. Branch prediction can make a lot of real world (read "not the finest code in the world") run at reasonable speeds. Another common pattern to give up would be eliding (branch predicting away) nil reference checks.

> short of entirely preventing speculating code from being able to load things into the cache

Some new server processors allow us to partition cache (to prevent noisy neighbors) [1,2]. I don't have experience working with this technology but everything I read makes me believe this mechanism can works on a per process basis.

If that kind of complexity is already possible in CPU cache hierarchy I wonder if it's possible to implement per process cache encryption. New processors (EPYC) can already use different encryption keys for each VM, so it might be a matter of time till this is extended further.

[1] https://danluu.com/intel-cat/

[2] https://lwn.net/Articles/694800/

This rant/thread is a good reading as well: https://lkml.org/lkml/2018/1/3/797

It's possible to key the cache in the kernel on CPL so at least there should be no user / kernel space scooping of cache lines.

It's possible we can never fully prevent all attacks in same address space. So certain types of applications (JIT and sandboxes) might forever be a cat and mouse game since we're unlikely to give up on branch prediction.

AFAICT injecting any sort of delay that prevents this attack would also completely negate any benefit from caches and that would take us back to 2000s performance at best, even with 10-16 core Xeon monsters. The branch predictor is really just a glorified cache prefetcher so you'd not only have to harden the branch predictor but anything that could possibly access the cache lines that the branch predictor has pulled up.
"The branch predictor is really just a glorified cache prefetcher so you'd not only have to harden the branch predictor..."

I was just thinking of the part they were talking about where it was too predictable, not the rest of the issues. Instead of a single hard-coded algorithm we could switch to something that has a random key element, like XOR'ing a rotating key instead of a hard-coded one, similar to some of the super-basic hashing some predictors already do. Prefetching I just don't know what to do with. I mentally started down the path of considering what it would take for the CPU to pretend the page was never cached in the first place on a misprediction, but yeow, that got complicated fast, between cache coherency issues between processors and all of the other crap going on there, plus the fact that there's just no time when we're talking about CPU and L1 interactions.

Timing attacks really blow. Despite the "boil the ocean" nature of what I'm about to say, I find myself wondering if we aren't better served by developing Rust and other stronger things to the point that even if the system is still vulnerable to timing attacks it's so strong everywhere else that it's a manageable problem. Maybe tack on some heuristics to try to deal with obvious hack attempts and at least raise the bar a bit. More process isolation (as in other links mtanski gives you can at least confine this to a proces). (What if Erlang processes could truly be OS processes as far as the CPU was concerned?) I'm not saying that is anything even remotely resembling easy... I'm saying that it might still be easier than trying to prevent timing attacks like this. That's a statement about the difficulty of fixing timing attacks in general, not the easy of "hey, everybody, what if you just wrote code better?", a "solution" I often make fun of myself.