| HN Mirror

It's not that simple. The problem is not just branches but often the intersection of memory and branches. For example, a really powerful technique for amplification is this:

ldr x2, [x2]

cbnz x2, skip

/* bunch of slow operations */

ldr x1, [x1]

add x1, x1, CACHE_STRIDE

ldr x1, [x1]

add x1, x1, CACHE_STRIDE

ldr x1, [x1]

add x1, x1, CACHE_STRIDE

ldr x1, [x1]

add x1, x1, CACHE_STRIDE

skip:

Here, if the branch condition is predicted not taken and ldr x2 misses in the cache, the CPU will speculatively execute long enough to launch the four other loads. If x2 is in the cache, the branch condition will resolve before we execute the loads. This gives us a 4x signal amplification using absolutely no external timing, just exploiting the fact that misses lead to longer speculative windows.

After repeating this procedure enough times and amplifying your signal, you can then direct measure how long it takes to load all these amplified lines (no mispredicted branches required!). Simply start the clock, load each line one by one in a for loop, and then stop the clock.

As I mentioned earlier, unless your plan is to treat every hit as a miss to DRAM, you can't hide this information.

The current sentiment for spectre mitigations is that once information has leaked into side channels you can't do anything to stop attackers from extracting it. There are simply too many ways to expose uarch state (and caches are not the only side channels!). Instead, your best and only bet is to prevent important information from leaking in the first place.