Hacker News new | ask | show | jobs
by artisanspam 1067 days ago
Why does disabling SMT not fully prevent this? I don't know the details of Zen 2 architecture, but register files are usually implemented as SRAM on the CPU-die itself. So unless the core is running SMT, I don't understand how another thread could be accessing the register file to write a secret.
2 comments

Because unless you pin the threads to certain CPU cores (e.g. in Linux by using the taskset command, or in Windows by using the Set Affinity command in Task Manager), they are migrated very frequently between cores.

So even with SMT disabled, each core will execute sequentially many threads, switching every few milliseconds from one thread to another, and each context switch does not modify the hidden registers, it just restores the architecturally visible registers.

Pinning doesn’t help either, since there will always be more threads than cores. Scheduling all those threads and even blocking on IO will cause context switches.
I do not know how that is done in Windows, but in Linux it is possible to reserve some cores to be used only for the threads that you assign to them and for no other threads.

This is done frequently for high-performance applications.

The pigeonhole principle does not stipulate which hole the extra pigeons have to appear in. Only that at least one hole must have more than one pigeon. It does not stipulate that all holes have to have pigeons; you can have 999 empty pigeon holes, and then a hole that has 1001 pigeons in it. The pigeonhole principle doesn't care.

In Linux it's possible to stipulate that, for instance, core 7 can only be used by super secret process PID 1234. If you have 400 other threads, that means the other threads will have to compete for cores 0-6. And if super secret PID 1234 is idle and there are 12 threads that are marked for scheduling, then they get to just wait for cores 0-6 to become available while core 7 stands idle.

I watched a talk several years ago about a HFT firm that ... abused? this principle. They had a big ass monster of a machine. Four sockets, four CPUs with gobs of cores and gobs of cache on each one. But the only thing they cared about was the latency on their HFT trade sniping process. If they could reduce the latency of receiving interesting information to executing a trade on that interesting information from (making up numbers) 1.1ms to 0.9ms, that was potentially thousands, millions of dollars in profit.

So if CPU socket 0 has cores 0-15, CPU socket 1 has cores 16-31, CPU socket 2 has cores 32-47, CPU socket 3 has cores 48-63, they marked cores 17-31,33-47,49-63 to be usable by nothing. Those cores are permanently and forever idle. They will never execute a single instruction. Ever. Core 16 can be used by PID 12345 and only by PID 12345, core 32 can be used by PID 7362 and only PID 7362, and core 48 can be used by PID 8765 and only PID 8765. This ensured that all data and all instructions used by their super high priority HFT process can never, ever be evicted from the cache.

Apparently it made a notable improvement in latency and therefore profit.

That does not apply when some cores are reserved for manual thread assignment, because the scheduler no longer throws pigeons in those holes, but schedules threads only on the other cores.
Because the context switch only affects architectural state not microarchitectural state.
Yes I understand that but I was struggling to think of a sequence of instructions that would cause this secret leaking on a single thread.

But a simple example is `vzeroupper` followed by anything that writes a secret to the same register file entry would be leaked on a subsequent flush.

It depends a bit on the exact details of the implementation, but there are several possibilities imaginable.

For example, a failed speculation of vzeroupper could result in it erroneously claiming a register by clearing the zero flag on the wrong register - which would mean that the previous data of that register is now suddenly available. If that register has not been touched since a context switch, it could leak data from another process.

The linked article has an animation which suggests that it clears the zero flag on the previously-used register - which indeed requires the victim to reuse the register in the small amount of time between it being marked as zero and the zero being cleared again.

However, the linked Github repo states:

> The undefined portion of our ymm register will contain random data from the register file. [..] Note that this is not a timing attack or a side channel, the full values can simply be read as fast as you can access them.

This suggests that it does indeed do something akin to clearing the zero flag of a random register.

That's not quite right. The attacker doss the vzeroupper rollback. Any registers in the physical file that haven't been overwritten can be exposed as a result, regardless of what the victim did.