Unpatched systems can leak SIMD/FP state between privilege levels. Pretty fucking high severity since that's where we stick private keys these days.
The cost is more expensive context switches currently since we'll have to fully unload and reload all SIMD/FP state. I'm sure Intel will fix this one in a couple gens.
The processor has XSAVE (the mechanism that we use to save/restore FPU state and more these days) optimizations internal to the processor that keep it from having to fully reload the FPU state. OSes like Linux have not been doing lazy FPU switching on processors with these optimizations for a long time.
See information about XSAVEOPT and the "Init and Modified Optimizations" in the SDM: intel.com/sdm .
As @luto said above, recent versions of Linux ripped out the lazy handling entirely.
This is unrelated and requires new patches. Somewhere else in the thread here, someone is saying that Linux isn't vulnerable, but I don't know for sure.
AFAICT it's not related to OOO execution, but rather to not cleaning normally accessible state after normal execution. No timers or tricky memory reads needed.
Yeah, probably if I'm understanding it right. This one seems less correlated with OoO than Meltdown or Spectre. It looks like you're just issuing a load based on FP register state you shouldn't touch. The system eventually notices it needs to have issued an exception based on the access to the FP register and eventually does quash the load within the core but the the load has already gone out to the memory hierarchy where you can see its effects on cache levels even though it never completes. Larger cores are harder to stop all at once than smaller cores so being OoO should correlate with being vulnerable but basically any pipelined processor that can throw an interrupt on register access could theoretically be vulnerable to something like this.
I think you probably do need to execute the should-fault FP access in a not-executed speculatively executed branch (à la Meltdown), so that the exception doesn't actually fire and the kernel doesn't reload your own FP state.
(Since you can only learn a small part of the state each time, you need to have the other processes state remain in the FPU while you repeat the process to learn the entire AES key or whatever).
It might be that the check of who can access the FP registers versus what the identity of the current process is takes a few clock cycles due to communication across the core and Intel didn't want to slow down the critical path register load for this since they figured they could just squash any improper execution later. But it might also be as you say.
The cost is more expensive context switches currently since we'll have to fully unload and reload all SIMD/FP state. I'm sure Intel will fix this one in a couple gens.