Hacker News new | ask | show | jobs
by tzs 3083 days ago
> I believe one solution would be to put permission checks before the memory access, which would add serialized latency to all memory access.

I don't see why that would have to add latency to all (or any) memory access. The addresses generated by programs (except in real mode, when everything has access to everything anyway so we don't care about these issues then) are virtual addresses, so they have to be translated to get the actual memory address.

The permission information for a page is stored in the same place as the physical address translation information for that page. The processor fetches it at the same time it fethes the physical base address of the page.

They should also have the current permission level of the program readily available. That should be enough to let them do something about Meltdown without any performance impact. They could do something as simple as if the page is a supervisor page and the CPU is not in supervisor mode don't actually read the memory. Just substitute fixed data.

Note that AMD is reportedly not affected by Meltdown. From what I've read that is because they in fact do the protection check before trying to access the memory, even during speculation, and they don't suffer any performance loss from that.

Note that since Meltdown is only an issue when the kernel memory read is on the path that does NOT become the real path (because if it becomes the real path, then the program is going to get a fault anyway for an illegal memory access), the replacing of the memory accesses with fixed data cannot harm any legitimate program.

Spectre is going to be the hard one for the CPU people to fix, I think. I think they may have to offer hardware memory protection features that can be used in user mode code to protect parts of that code from other parts of that code, so that things that want to run untrusted code in a sandbox in their own processes can do so in a separate address space that is protected similar to the way kernel space is protected from user space.

It may be more complicated than that, though, because Spectre also does some freaky things that take advantage of branch prediction information not being isolated between processors. I haven't read enough to understand the implications of that. I don't know if that can be defeated just be better memory protection enforcement.

2 comments

> I don't see why that would have to add latency to all (or any) memory access. The addresses generated by programs (except in real mode, when everything has access to everything anyway so we don't care about these issues then) are virtual addresses, so they have to be translated to get the actual memory address.

L1 caches are generally virtually indexed for exactly this reason: to allow a L1 cache read to happen in parallel with the TLB lookup. (They're also usually, I believe, physically tagged, so we have to check for collisions at some point, but making sure there's no side channel information at that point is, obviously given recent events, hard.)

Indeed - Meltdown has an "easy" fix and now it's known it should be possible to design chips which are not vulnerable.

Spectre is, as you say, harder - but more because the line of what sort of state should be separate isn't as clear-cut as we might like it to be (i.e. it's not neccessarily just "processes" as the OS sees it - e.g. JVM/JavaScript interpreter state should allow for an effective sandbox between the executing interpreter/JVM process and what the running JVM/JavaScript code can see). And worse, those are precisely the cases where one probably cares most about separation given that's where untrusted code is typically run.

But hardware assistance could help - in simple terms, I'd imagine that allowing a swap out of more of the internal processor state (to the extent that one process "training" the branch-predictor doesn't impact how the branch predictor acts in another process) would be pretty effective. That might be expensive in terms of performance per-transistor/per-watt however (though probably not absolute performance).

If we're looking at hardware design changes, it really feels like what we actually need is to add a place to hold a nonce that the OS/hypervisor can set per-process/per-vm, and incorporate those bits in the CPU cache tags so cache lines never match across security boundaries, which would close the side channel used to exfiltrate information.