Hacker News new | ask | show | jobs
by fooker 3088 days ago
retpoline seems to be a novel concept. Can anyone ELI5?

Also, any insight about performance impact here?

3 comments

An indirect jump is when your program asks the CPU to transfer control to a location that your code itself computes: "jmp %register". Compare to a direct jump, where the destination of the jump is hardcoded into the jump instruction itself: "jmp $0x100".

Most programs have indirect jumps somewhere. Higher-level languages with virtual function calls have lots of indirect jumps, because they parameterize functions: to get the "length" of the variable "foo", the function "bar" has to call one of 30 different functions, depending on the type of "foo"; the function to call is read out of a table at some offset from the base address of "foo". Or, another example is switch statements, which can compile down to jump tables.

What we want, to mitigate Spectre, is to be able to disable speculative execution for indirect jumps. The CPU doesn't provide a clean way to do that directly.

So we just stop using the indirect jump instructions. Instead, we abuse the fact that "ret" is an indirect jump.

"Call" and "ret" are how CPUs support function calls. When you "call" a function, the CPU pushes the return address --- the next instruction address after the "call" --- to the stack. When you return from a function, you pop the return address and jump to it. There's a sort of "jmp %register" hidden in "ret".

You abuse "ret" by replacing indirect jumps with a sequence of call/mov/jump, where the mov does a switcheroo on the saved return address.

The obvious next question to ask here is, "why don't CPUs predict and speculatively execute rets?" And, they do. So the retpoline mitigates this: instead of just "call/pop/jump", it does "call/...pause/jmp.../mov/jmp", where the middle sequence of instructions set off in "..." is jumped over and not executed, but captures the speculative execution that the CPU does --- the CPU expects the "ret" to return to the original "call", and does not know how to predict around the fact that we did the switcheroo on the return address.

How'd I do?

> Or, another example is switch statements, which can compile down to jump tables.

Is the overhead of the retpoline such that it's no longer a benefit to compile switches to jump tables?

Pretty well, thanks. What I'm wondering is: The attack is using the data fetched into the cache from a speculative indirect jump to do a timing attack and discover what's in the former, correct? Why can't the CPU mark the cache area it fetched in the speculative jump as "stale" and discard it? Why wouldn't that fix the problem?
I don't know any way to leave enough breadcrumbs to do that in four clock cycles, do you?
Intel is going to release a microcode update for BTB control apparently.
retpoline is just a convoluted way of doing an indirect jump/call designed to make branch prediction entirely useless. It's a novel concept because doing this is completely opposite to making a program run faster.

Here is an example of the most common programming patterns that end up causing indirect jumps/calls:

https://godbolt.org/g/eThmnG

Imagine every virtual function call in a C++ program being mispredicted and taking twice as long.

(Instead of forcing us to recompile the world, maybe Intel should just disable branch prediction in microcode.)

> Imagine every virtual function call in a C++ program being mispredicted and taking twice as long.

> (Instead of forcing us to recompile the world, maybe Intel should just disable branch prediction in microcode.)

Wouldn't the performance impact be dramatic ? In this[1] example there's a 6 times slowdown between situation with and without correct branch prediction.

[1]: https://stackoverflow.com/questions/11227809/why-is-it-faste...

I don’t think this is about all branch prediction—just about branch jump prediction. Like, “jump to %rax, but don’t try to guess what %rax is before you’re 100% certain.” Not the same as “jump to a known location if you think this here register is true/false”. As far as I can piece together, the exploit relies on making the branch predictor think the branch target will be somewhere you stored malicious code , which will then be executed by another process, e.g. a kernel. If it does harm before the branch predictor catches that it was wrong, you’re home free.

I’m not sure, but that’s what it looks like, so far.

This doesn't affect branches, only indirect jumps (and calls). The performance impact will still be considerable. It will make PGO more crucial (or smarter JITting, for VM languages) since the penalty can be avoided by prefixing an indirect call with a direct call to the most likely target - this is a well-known technique, useful on machines with weaker jump predictors than branch predictors.

Quite possibly, the worst affected code will be OO code that is dynamically (open) polymorphic.

FWIW, all branches will need to be followed by a memory load fence which seems to trip up speculative execution on Intel CPUs.
There is still branch prediction for normal calls or jumps, which are the majority and should be in performance conscious code.

It's just that some language features such as virtual functions in C++ often require indirect invocation when the compiler can't devirtualize a call, and there is lots of it in the kernel in performance-critical paths (think interrupts, syscalls).

There are two paragraphs dedicated to performance impact in the linked PR.
By design, with retpoline indirect branches won't be able to take advantage of branch prediction. This is nontrivial, but can't be helped. Performance impact should be negligible otherwise.
Considering that every single function call into a dynamically loaded library will be affected, that negligible "otherwise" won't be so negligible in the real world.