| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by martin1975 3084 days ago
	I'm curious if someone can point me to any source that discusses how the next generation of CPUs that Intel, AMD, ARM might be working on is actually going to address this & the Spectre issue architecturally.. It's great that we have a potentially performance killing fix but the real "fix" or rather, solution, is to alter the architecture. Since I'm not an EE/CE dude... is anyone aware of where such discussions on the WWW might be taking place? by the way, that PoC was intense. Makes you wonder if the NSA knew about it all along :)

8 comments

krylon 3084 days ago

> by the way, that PoC was intense. Makes you wonder if the NSA knew about it all along :)

Colin Percival found a very similar issue with Intel's implementation of SMT on the Pentium 4 in 2005: http://www.daemonology.net/papers/htt.pdf

So the general idea of using timing attacks against the cache to leak memory has been known for at least that long.

In 2016, two researchers from the University of Graz gave a talk at the 33C3, where they showed that they had managed to use that technique to establish a covert channel between VMs running on the same physical host. They even managed to run ssh over than channel. https://media.ccc.de/v/33c3-8044-what_could_possibly_go_wron...

In light of that, I would be surprised if the NSA had not known about this.

tptacek 3083 days ago

Can I put a plug in again for how fucking cool the Meltdown and Spectre attacks are? They're much more interesting than just cache timing, which as you note have been well-known for at least a decade (and much earlier in the covert channel literature).

Unlike "vanilla" cache-timing attacks:

* Meltdown and Spectre involve transient instructions, instructions that from the perspective of the ISA never actually run.

* Spectre v1 undermines the entire concept of a bounds check; pre-Spectre, virtually every program that runs on a computer is riddled with buffer overreads. It's about as big a revelation as Lopatic's HPUX stack overflow was in 1995. There might not be a clean fix! Load fences after ever bounds check?

* Spectre v2 goes even further than that, and allows attackers to literally pick the locations target programs will execute from. Try to get your head around that: we pay tens of thousands of dollars for vulnerabilities that allow us to return to arbitrary program locations, and Spectre's branch target injection technique lets us use the hardware to, in some sense, do that to any program. And look at the fix to that: retpolines? Compilers can't directly emit indirect jumps anymore?

It's good that we're all recognizing how big a problem cache timing is. It was for sure not taken as seriously as it should have been outside of a subset of cryptographers. But Meltdown and Spectre are not simply cache timing vulnerabilities; they're a re-imagining of what you can do to a modern ISA by targeting the microarchitecture.

krylon 3083 days ago

> Can I put a plug in again for how fucking cool the Meltdown and Spectre attacks are?

Yes, you can! :)

I get your point. From the perspective of somebody who normally does not deal with such low-level affairs, the difference to prior cache timing attacks is not /that/ obvious. It all looks like black magic to me, even after I roughly understand how it works.

_wmd 3083 days ago

There's no reason to invent new terminology for speculative execution. Also, the mapping between the variety of CPU caches and real memory has been known imperfect since the beginning of time.

But meanwhile, yes, can definitely agree how fucking cool it all is

tptacek 3083 days ago

What new terminology got invented?

_wmd 3083 days ago

Perhaps I've been living under a rock for the past 30 years, but transient instructions are a new idea for me

tptacek 3083 days ago

What's the "old" term for speculated instructions that aren't retired?

chx 3083 days ago

> In light of that, I would be surprised if the NSA had not known about this.

Call me a tinfoil hat conspiracist but the only rational explanation I can find of IBM POWER and z CPUs still vulnerable to Spectre is the NSA forcing IBM not to fix it. I read somewhere that the z196 had three magnitudes more validation routines than the Intel Core at that time. It's extremely hard to believe they haven't caught this.

warkdarrior 3083 days ago

Cache timing attacks have been known for a while, for example across VMs in 2009: https://cseweb.ucsd.edu/~hovav/dist/cloudsec.pdf

ckastner 3083 days ago

This paper from 2005 describes a cache timing attack that enabled an unprivileged process to another process' AES key:

https://www.cs.tau.ac.il/~tromer/papers/cache.pdf

IIRC, the only way to address the issue was the addition of the AES-NI instruction set, which came a few years later.

cesarb 3083 days ago

> IIRC, the only way to address the issue was the addition of the AES-NI instruction set, which came a few years later.

Another option would be to use a bitsliced implementation of AES, at some cost in speed. I could also imagine an implementation which read the whole table every time, using constant-time operations to select the desired element(s), but I don't know how slow that would be.

arkadiyt 3083 days ago

> Makes you wonder if the NSA knew about it all along :)

Former head of TAO Rob Joyce said "NSA did not know about the flaw, has not exploited it and certainly the U.S. government would never put a major company like Intel in a position of risk like this to try to hold open a vulnerability." [1]

Who knows if that's true or not, though. Certainly the U.S. government has done exactly that many times in the past (like with heartbleed).

[1]: https://www.washingtonpost.com/business/technology/huge-secu...

SheinhardtWigCo 3083 days ago

It's odd to publicly state that they didn't know about it, because now if they don't do the same after the next big flaw comes out, the implication will be that they indeed knew and were quietly exploiting it. I thought that was why they generally don't comment on these things. The less-charitable assumption is that they'll make this claim every time regardless of whether it's true.

The claim that "the U.S. government would never put a major company like Intel in a position of risk" is obviously bullshit. TAO's job necessarily involves exposing companies both in the US and overseas to that kind of risk on a daily basis.

arrestedDevelpr 3083 days ago

Implications? Who cares what the peanut gallery thinks?

dirtbox 3083 days ago

It's the type of announcement that makes me wonder if they had the chip makers incorporate it specifically for them to exploit.

mehrdadn 3083 days ago

> It's the type of announcement that makes me wonder if they had the chip makers incorporate it specifically for them to exploit.

...sorry, what?

It makes you wonder if the NSA had chip makers incorporate speculative execution and caching because... timing attacks?

dirtbox 3083 days ago

No.

It's just that it's highly suspicious that anyone is making any type of mention of it at all.

rdtsc 3083 days ago

That is an odd one. Why say that instead of of the usual, "we can't comment on that".

> U.S. government would never put a major company like Intel in a position of risk like this to try to hold open a vulnerability." [1]

They subverted the Dual_EC_DRBG standardization process. Had they not been caught and the algorithm ended up on more devices they would be hurting not just major companies but whole industries.

Also for reference: https://en.wikipedia.org/wiki/Bullrun_(decryption_program)

mpweiher 3083 days ago

<tinfoil>

Note that it talks about "the flaw", whereas Intel claims it isn't a "flaw". So could be another instance of overly specific denial. "We didn't exploit this flaw, because it isn't a flaw. We exploited the processor operating as designed".

</tinfoil>

arkh 3083 days ago

> the U.S. government would never put a major company like Intel in a position of risk like this to try to hold open a vulnerability

The US government sure. The NSA? They sure would as this statement shows.

amdavidson 3083 days ago

Are you arguing that the NSA does not fall under the umbrella of the "US Government"?

white-flame 3084 days ago

To my understanding, the memory subsystem is fetching a byte in parallel with access permission checks. If the byte is discarded due to mis-speculation, then the result of the permission check is ignored, but the cache is still in an updated state.

I believe one solution would be to put permission checks before the memory access, which would add serialized latency to all memory access. Another would be to have the speculative execution system flush cache lines that were loaded but ultimately ignored, which would be complex but probably not be as much of a speed hit.

(edit: yeah, a simple "flush" is insufficient, it would have to be closer to an isolated transaction with rollback of the access's effects on the cache system.)

jimrandomh 3084 days ago

Flushing cache lines doesn't work, at least not straightforwardly. The attacker can arrange things so that the cache line is pre-populated with something else that it has access to, with a colliding address that will be evicted by the speculative load. Flushing undoes the load, but can't easily undo the eviction.

tzs 3083 days ago

> I believe one solution would be to put permission checks before the memory access, which would add serialized latency to all memory access.

I don't see why that would have to add latency to all (or any) memory access. The addresses generated by programs (except in real mode, when everything has access to everything anyway so we don't care about these issues then) are virtual addresses, so they have to be translated to get the actual memory address.

The permission information for a page is stored in the same place as the physical address translation information for that page. The processor fetches it at the same time it fethes the physical base address of the page.

They should also have the current permission level of the program readily available. That should be enough to let them do something about Meltdown without any performance impact. They could do something as simple as if the page is a supervisor page and the CPU is not in supervisor mode don't actually read the memory. Just substitute fixed data.

Note that AMD is reportedly not affected by Meltdown. From what I've read that is because they in fact do the protection check before trying to access the memory, even during speculation, and they don't suffer any performance loss from that.

Note that since Meltdown is only an issue when the kernel memory read is on the path that does NOT become the real path (because if it becomes the real path, then the program is going to get a fault anyway for an illegal memory access), the replacing of the memory accesses with fixed data cannot harm any legitimate program.

Spectre is going to be the hard one for the CPU people to fix, I think. I think they may have to offer hardware memory protection features that can be used in user mode code to protect parts of that code from other parts of that code, so that things that want to run untrusted code in a sandbox in their own processes can do so in a separate address space that is protected similar to the way kernel space is protected from user space.

It may be more complicated than that, though, because Spectre also does some freaky things that take advantage of branch prediction information not being isolated between processors. I haven't read enough to understand the implications of that. I don't know if that can be defeated just be better memory protection enforcement.

ahh 3083 days ago

> I don't see why that would have to add latency to all (or any) memory access. The addresses generated by programs (except in real mode, when everything has access to everything anyway so we don't care about these issues then) are virtual addresses, so they have to be translated to get the actual memory address.

L1 caches are generally virtually indexed for exactly this reason: to allow a L1 cache read to happen in parallel with the TLB lookup. (They're also usually, I believe, physically tagged, so we have to check for collisions at some point, but making sure there's no side channel information at that point is, obviously given recent events, hard.)

tomatocracy 3083 days ago

Indeed - Meltdown has an "easy" fix and now it's known it should be possible to design chips which are not vulnerable.

Spectre is, as you say, harder - but more because the line of what sort of state should be separate isn't as clear-cut as we might like it to be (i.e. it's not neccessarily just "processes" as the OS sees it - e.g. JVM/JavaScript interpreter state should allow for an effective sandbox between the executing interpreter/JVM process and what the running JVM/JavaScript code can see). And worse, those are precisely the cases where one probably cares most about separation given that's where untrusted code is typically run.

But hardware assistance could help - in simple terms, I'd imagine that allowing a swap out of more of the internal processor state (to the extent that one process "training" the branch-predictor doesn't impact how the branch predictor acts in another process) would be pretty effective. That might be expensive in terms of performance per-transistor/per-watt however (though probably not absolute performance).

tsukikage 3082 days ago

If we're looking at hardware design changes, it really feels like what we actually need is to add a place to hold a nonce that the OS/hypervisor can set per-process/per-vm, and incorporate those bits in the CPU cache tags so cache lines never match across security boundaries, which would close the side channel used to exfiltrate information.

thisoneforwork 3084 days ago

Would "flushing on ignore" not leave the cache side channel open for many instructions before the abort?

martin1975 3084 days ago

the first approach sounds kind of expensive to be done at the cpu level. I like your second one better. thank you!

dannyw 3083 days ago

AMD already takes the first approach to prevent Meltdown.

white-flame 3084 days ago

Actually, my preferred solution would be to eliminate the notion of distributing machine code binaries entirely, but that's a bit beyond the scope of these discussions. ;-)

martin1975 3084 days ago

so run everything in a VM?

white-flame 3084 days ago

No, creating a block of machine code bytes to execute would be a privileged operation. All code would run through a privileged CPU-specific compiler first, and there'd be no way to run raw machine code bytes otherwise.

If there are bugs that can be exposed through various machine code patterns, the compiler can centralize the restrictions of what may be executed, enforce runtime checks, or prevent certain instructions from being used at all. Security or optimization updates would affect all running programs automatically. Granted, these current speculative vulnerabilities would be much more difficult to statically detect.

But it would follow the crazy gentoo dream of having everything optimized for your environment better, allow much better compatibility across systems, and prevent entire classes of privilege escalation issues.

Terr_ 3084 days ago

> no way to run raw machine code bytes otherwise [...] restrictions of what may be executed, enforce runtime checks, or prevent certain instructions from being used at all [...] everything optimized for your environment better, allow much better compatibility across systems and prevent entire classes of privilege escalation issues.

So... basically re-inventing Java? :)

"Raw machine code bytes" aren't distributed but occur through the privileged JVM and its just-in-time compiler, the byte-code verifier enforces restrictions on what data-access patterns and where instructions can be used, the JVM for a particular OS has optimizations for that environment, and sandboxing (while imperfect) blocks some classes of privilege escalation issues.

Don't get me wrong, I'm not saying Java is perfect or that the underlying goal isn't good, I'm just happily amused by this sense of "everything old is new again."

dreish 3084 days ago

I've been thinking along the same lines for the last few years. If you did this, you could have a multi-user operating system in a single address space and avoid the cost of interrupts for system calls (which would just be like any other function call).

mykull 3083 days ago

We'd need a better binary representation of uncompiled code, then. Moving around lots of code as ascii is kind of suboptimal... I wouldn't want that. By all means, show it as text to the user, but don't store it that way.

martin1975 3084 days ago

and what if I wrote a compiler that doesn't heed any of your security concerns? It would still compile to machine code and continue to be able to exploit things Spectre/Meltdown style? Or am I off here?

martin1975 3084 days ago

cool .. I think I get it. It's like compiler/instruction based DRM ... CPU specific permission to run code. Maybe they can leverage existing TPM chips to do this...

I just don't want to see performance being decimated as a trade off for security, if at all possible.

spullara 3084 days ago

i'm not so sure. memory accesses are so slow (hundreds of cycles) it probably wouldn't be that much slower to issue them a few instructions later. when it was introduced memory access and cycles were much closer together, only a few cycles and it saved a huge amount of time.

gpderetta 3084 days ago

Main memory access take an order of hundred cycles. D1 cache hit access usually take usually 3-4 cycles. Microarchitecture designers will take heroic efforts to even shave a single cycle here. Adding an overhead of even a couple of cycles would be a huge deal.

Having said that, AMD CPUs are the existence proof that you can be immune to meltdown with no significant overhead.

Spectre is a completely different issue though.

londons_explore 3083 days ago

AMD CPUs have pretty poor single threaded performance.

Perhaps that because they haven't taken the speed short-cuts that Intel took...?

jandrese 3083 days ago

Didn't Ryzen close the single thread performance gap quite a bit?

But yeah, protecting against it means implementing memory protection in more places in the CPU. More gates and the possibility of becoming a bottleneck.

arcticbull 3083 days ago

With Ryzen, they're pretty much equal on an IPC basis.

thisoneforwork 3084 days ago

Not a CPU designer, but my guess is that they will move the cache management logic from the MMU to the µOP scheduler, which will commit to cache on retirement of the speculatively executed instruction. They would then need to introduce some sort of L0 cache, accessible only at the microarchitectural level, bound to a speculative flow, and flushed at retirement.

sspiff 3084 days ago

How does this work for two instructions in the pipeline at the same time that refer to the same cache line? If the second instruction executes the read phase before the first is retired/committed to cache, you would be hit by two memory fetch latencies, significantly hurting performance.

I guess compilers could pad that out with noops to postpone the read until the previous commit is done if they know the design of the pipeline they are targetting. But generically optimized code would take a terrible hit from this.

thisoneforwork 3084 days ago

Firstly, thanks for the question. As mentioned, not a CPU designer or trying to teach Intel what to do. More like relying on the hive mind to see if I have the right idea.

A second instruction in the pipeline would read from the above mentioned L0 cache (let us call it load buffer), much like it would for tentative memory stores from the store buffer.

Also, two memory fetches in parallel are not twice as long as a memory fetch, if that would be the solution (which I guess would not be the case, as I imagine race conditions appearing)

foota 3084 days ago

I don't think you can allow two speculatively executing instructions to read from the same L0 cache.

For example say the memory address you want to look for being cached is either 0x100 or 0x200 (not realistic addresses but it works for example) based on some kernel memory bit. Then run instructions in userspace that try to fetch 0x100 (with flushes in between). If you notice one that completes quickly, then it must have used the value 0x100 cached in L0 cache by the kernel? (and also run over 0x200 to try and check when it's cached in L0)

nialv7 3083 days ago

L0 is only used by speculatively executed uOPs, before they are actually committed. Therefore anything that reads from L0 has to be speculatively executed too.

So if the uOP populated the L0 was reading from kernel memory, then it won't be committed. And subsequent uOP read from the L0 won't be committed either. So you can't get timing information from them.

foota 3082 days ago

But if another instruction reads from the same cache then that one could retire.

JdeBP 3083 days ago

Intel's paper outlines a roadmap for future work.

* https://newsroom.intel.com/wp-content/uploads/sites/11/2018/... (https://news.ycombinator.com/item?id=16079910)

woliveirajr 3083 days ago

I'm curious on how Transmeta chips [0] would have suffered/be unaffected by such exploits. Being a CPU that runs cpu microcodes, probably the patch would be easier, it necessary at all.

[0] https://en.wikipedia.org/wiki/Transmeta

chacham15 3083 days ago

There was a HN article a while ago that discussed making use of an existing cpu isa extension to solve the problem in a performant manner: PCID. More here: http://archive.is/ma8Iw

cm2187 3083 days ago

By the way, I understand the fixes are being rolled out now. Do we have a more precise idea of the performance hit on windows and linux?