| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fooker 3088 days ago
	retpoline seems to be a novel concept. Can anyone ELI5? Also, any insight about performance impact here?

3 comments

tptacek 3088 days ago

An indirect jump is when your program asks the CPU to transfer control to a location that your code itself computes: "jmp %register". Compare to a direct jump, where the destination of the jump is hardcoded into the jump instruction itself: "jmp $0x100".

Most programs have indirect jumps somewhere. Higher-level languages with virtual function calls have lots of indirect jumps, because they parameterize functions: to get the "length" of the variable "foo", the function "bar" has to call one of 30 different functions, depending on the type of "foo"; the function to call is read out of a table at some offset from the base address of "foo". Or, another example is switch statements, which can compile down to jump tables.

What we want, to mitigate Spectre, is to be able to disable speculative execution for indirect jumps. The CPU doesn't provide a clean way to do that directly.

So we just stop using the indirect jump instructions. Instead, we abuse the fact that "ret" is an indirect jump.

"Call" and "ret" are how CPUs support function calls. When you "call" a function, the CPU pushes the return address --- the next instruction address after the "call" --- to the stack. When you return from a function, you pop the return address and jump to it. There's a sort of "jmp %register" hidden in "ret".

You abuse "ret" by replacing indirect jumps with a sequence of call/mov/jump, where the mov does a switcheroo on the saved return address.

The obvious next question to ask here is, "why don't CPUs predict and speculatively execute rets?" And, they do. So the retpoline mitigates this: instead of just "call/pop/jump", it does "call/...pause/jmp.../mov/jmp", where the middle sequence of instructions set off in "..." is jumped over and not executed, but captures the speculative execution that the CPU does --- the CPU expects the "ret" to return to the original "call", and does not know how to predict around the fact that we did the switcheroo on the return address.

How'd I do?

link

kibwen 3088 days ago

> Or, another example is switch statements, which can compile down to jump tables.

Is the overhead of the retpoline such that it's no longer a benefit to compile switches to jump tables?

link

StavrosK 3088 days ago

Pretty well, thanks. What I'm wondering is: The attack is using the data fetched into the cache from a speculative indirect jump to do a timing attack and discover what's in the former, correct? Why can't the CPU mark the cache area it fetched in the speculative jump as "stale" and discard it? Why wouldn't that fix the problem?

link

ahh 3088 days ago

I don't know any way to leave enough breadcrumbs to do that in four clock cycles, do you?

link

nwmcsween 3088 days ago

Intel is going to release a microcode update for BTB control apparently.

link

revelation 3088 days ago

retpoline is just a convoluted way of doing an indirect jump/call designed to make branch prediction entirely useless. It's a novel concept because doing this is completely opposite to making a program run faster.

Here is an example of the most common programming patterns that end up causing indirect jumps/calls:

https://godbolt.org/g/eThmnG

Imagine every virtual function call in a C++ program being mispredicted and taking twice as long.

(Instead of forcing us to recompile the world, maybe Intel should just disable branch prediction in microcode.)

link

littlestymaar 3088 days ago

> Imagine every virtual function call in a C++ program being mispredicted and taking twice as long.

> (Instead of forcing us to recompile the world, maybe Intel should just disable branch prediction in microcode.)

Wouldn't the performance impact be dramatic ? In this[1] example there's a 6 times slowdown between situation with and without correct branch prediction.

[1]: https://stackoverflow.com/questions/11227809/why-is-it-faste...

link

nothrabannosir 3088 days ago

I don’t think this is about all branch prediction—just about branch jump prediction. Like, “jump to %rax, but don’t try to guess what %rax is before you’re 100% certain.” Not the same as “jump to a known location if you think this here register is true/false”. As far as I can piece together, the exploit relies on making the branch predictor think the branch target will be somewhere you stored malicious code , which will then be executed by another process, e.g. a kernel. If it does harm before the branch predictor catches that it was wrong, you’re home free.

I’m not sure, but that’s what it looks like, so far.

link

scatters 3088 days ago

This doesn't affect branches, only indirect jumps (and calls). The performance impact will still be considerable. It will make PGO more crucial (or smarter JITting, for VM languages) since the penalty can be avoided by prefixing an indirect call with a direct call to the most likely target - this is a well-known technique, useful on machines with weaker jump predictors than branch predictors.

Quite possibly, the worst affected code will be OO code that is dynamically (open) polymorphic.

link

contrarian_ 3088 days ago

FWIW, all branches will need to be followed by a memory load fence which seems to trip up speculative execution on Intel CPUs.

link

revelation 3088 days ago

There is still branch prediction for normal calls or jumps, which are the majority and should be in performance conscious code.

It's just that some language features such as virtual functions in C++ often require indirect invocation when the compiler can't devirtualize a call, and there is lots of it in the kernel in performance-critical paths (think interrupts, syscalls).

link

blattimwind 3088 days ago

There are two paragraphs dedicated to performance impact in the linked PR.

link

sanxiyn 3088 days ago

By design, with retpoline indirect branches won't be able to take advantage of branch prediction. This is nontrivial, but can't be helped. Performance impact should be negligible otherwise.

link

dingo_bat 3088 days ago

Considering that every single function call into a dynamically loaded library will be affected, that negligible "otherwise" won't be so negligible in the real world.

link