Hacker News new | ask | show | jobs
by bloorp 3095 days ago
As I read through the meltdown paper, it looks really difficult to have the security we want and the performance we want at the same time. It's pretty crazy, but here's my limited understanding:

There's a huge shared buffer between two threads. 256 * 4K. One thread reads a byte of kernel memory, literally any byte it wants, and it then reads one of those 4K pages from that buffer in order to cache that one memory page that corresponds to the byte it just read. Then at some point the CPU determines that the thread shouldn't be permitted to access the kernel memory location, and rolls back all of that speculative execution, but the cached memory page isn't affected by the rollback.

The other thread iterates through those 256 pages, timing how long it takes to read from each page, and the one page that Thread A accessed will have a different (shorter?) timing because it's cached already. It now understands one byte of kernel memory that it shouldn't. That's just one byte but the whole process is so fast that it's easy to just go nuts on the whole kernel address space.

So what would the fixes be? Disable speculative execution? Only do it if the target memory location is within userspace, or within the same space as the executing address? Plug all of the sideband information leak mechanisms? I dunno.

5 comments

Keep a small pool of cache lines exclusive to speculative execution, discard when non taken, rename affected cache lines (like register renaming so no copy) when taken.
Also, separate BTB for each process and privilege level.
Yes, this would have a bonus effect of actually gaining IPC in multi process loads.
In the simplest Meltdown case, the offending instruction is really executed and a General Protection Fault occurs. That is handled in the kernel which at that point could (simply?) flush all caches to remove the leaked information.

The real problem with Meltdown seems to occur when: 1) The offending instruction is NOT really executed because it is in a branch which is not actually taken. 2) The offending instruction is executed but within a transaction, which leads to an exception-free rollback (with leaked information left in cache though).

AFAIK neither is (or can be made) visible to the kernel (which could explain the very large PTI patch), but I do wonder if they are events that can be hanlded at the microcode level, in which case a microcode update from Intel could mitigate them.

The MELTDOWN one is the easy one (as is evident by the fact that this is the one that only seems to affect Intel CPUs).

When a load is found to be illegal, an exception flag is set so that if the instruction is retired (ie. the speculated execution is found to be the actual path taken), a page fault exception can be raised. To prevent MELTDOWN, at the same time that the flag is raised you can set the result of the load to zero.

SPECTRE is the really hard one to deal with. Part of the solution might be providing a way for software to flush the branch predictor state.

Maybe separate BTBs. Or maybe disable branch target prediction when in kernel mode (but then some VM process may still observe some other process running inside a different VM via a side channel).
Not allow user processes to recover from a SEGV. The attack depends on a signal hander that traps the signal and resumes execution. If this is disabled then the attack will not work. This would affect two types of systems:

1. Badly written code where bugs are being masked by the handler. 2. Any kind of virtualization?

So, for cloud providers it looks like a 30% performance hit, but for the rest of us I would rather have a patch that stops applications handling the SEGV trap.

The attacks do not rely on recovering from SIGSEGV. The speculated execution that accesses out-of-bounds or beyond privilege level happens in a branch that's predicted-taken but actually not-taken, so the exception never occurs.
Ah, ok - then I read the paper wrongly. i’ll go back and have another look.

Edit: yes, I missed the details in section 4.1 when I skimmed through. I’m not familiar with the Kocker paper, but I assume the training looks like this?

for(int i=0 i<n; i++) if(i==n-1) do_probe();

After thinking about this I think you may be right. It might be hard (or impossible to do in practice).

> or within the same space as the executing address

That's probably a good place to start from. I'm guessing there still would be issues here with JITed code coming from a untrusted source.