Hacker News new | ask | show | jobs
by MBCook 3092 days ago
See the Twitter thread here: https://twitter.com/nicoleperlroth/status/948678006859591682

(Edit: there are 9 posts total, go to her user page to see them all)

Seems there are two issues. One, called Meltdown, only effects Intel and is REALLY bad, but the kernel page table changes everyone is making fixes it.

The other, dubbed Spectre, is apparently common to the way all processors handle speculative execution and is unfixable without new hardware.

I’d like to know more about that but I haven’t seen anything yet.

Whoever discovered this stuff on Google’s team deserves some sort of computer security Nobel prize.

4 comments

That's not even close to a thread...

You can see all the tweets here (courtesy of @svenluijten): https://twitter.com/i/moments/948681915485351938.

After reading that thread, I sort of wonder if this is the catalyst for the next tech bust. Prices on the basic building block of the modern tech industry (a server shard) going up 30%, or even more as shared/virtual services must be decommissioned for isolation? Surely it’s an alarmist thing to think and I don’t think it’s likely, but if you asked me yesterday the likeihood of an underlying security vulnerability effecting every processor since 1995 I’d have said probably not.

Major props to the teams working on this... now time for us all to hold onto our pants as we ask for budget increases that will make shareholders demand blood.

would be interesting to look at public companies where server costs going up 20% would kill their profit margins.
I don't think this is likely anywhere.

The only sorts of companies where server costs could increase hugely due to a sudden need for hardware isolation are those where they're running tiny or incredibly bursty workloads. Big companies like Netflix that use tons of cores can just binpack their work all together on the same hardware so their jobs only share hardware with other jobs controlled by the same company. Effectively, cloud providers will start offering sub-clouds into which only your own jobs will be scheduled.

This is actually how cloud tech has worked for many years internally. I worked at Google for a long time and their cluster control system (Borg) had a concept called "allocs" which were basically scheduling sub-domains. You could schedule an alloc to reserve some resources, and then schedule jobs into the alloc which would share those resources. Allocs were often used to avoid performance-related interference from shared jobs, e.g. when a batch job kept hogging the CPU caches and slowing down latency sensitive servers. I suppose these days VMs and containers do a similar job, though I think the Borg approach was nicer and more efficient.

I guess this sort of per-firm isolation will become common and most companies costs won't change a huge amount. The people it'll hit will be small mom-and-pop personal servers, but they're unlikely to care about side channel attacks anyway. So I wouldn't sell stock in cloud providers just yet.

The linked thread suggests that Spectre doesn't have _any_ mitigation.

> The business/economic implications are not clear, since eventually the only way to eradicate the threat posed by Spectre is to swap out hardware.

Is this fully accurate, there's no software mitigation available now?

From [0], the above may be true:

> There is also work to harden software against future exploitation of Spectre, respectively to patch software after exploitation through Spectre .

There is 'work'? No current patch? So Spectre is unpatched?

This point doesn't seem to be being highlighted but appears particularly important.

[0] https://meltdownattack.com/#faq-fix

Yes, from my understanding, Spectre is an architectural-level flaw in the so-called speculative execution unit. In other words, Spectre will only be fixed once Intel, AMD, and ARM redesign the unit and release new processors. Given the timelines of CPU design, this will take 5-10 years at least.

On the positive side, the flaw is very difficult to exploit in a practical setting.

> On the positive side, the flaw is very difficult to exploit in a practical setting.

Is it?

"As a proof-of-concept, JavaScript code was written that, when run in the Google Chrome browser, allows JavaScript to read private memory from the process in which it runs"

So is this fixable or not?
There are possible mitigations for cloud providers: 1) pay $x / hour and run on shared machine with possibility of an attack; 2) pay $y / hour (where x < y) and run all your processeses on dedicated machines without anybody else.

Moreover the option 2) already exists for large customers and security sensitive applications (e.g. CIA dedicated cloud built by Amazon).

Amazon instances can be created with the dedicated flag. The host hardware will be dedicated to you, not shared with any other users. It should mitigate the attack.

The flag has a fixed fee in the thousands of dollars and each instance is 10% more expensive.

I didn’t find out there were more than 4 posts until after I made my comment (thus the edit).

Thanks for the handy link.

I can't really see how it would be fixable even with new hardware.

Speculative execution is fundamental to getting decent performance out of a CPU. Without it you should probably divide your performance expectations by 5 at least.

Rolling back all state rather than just user visible state in the CPU is neigh on impossible. When you evict something from the cache, you delete it. Undeleting is hard. There are also a lot of other non-user-visible bits of state in a CPU.

I agree that we'll probably see new attacks in this area for a long time.

That said, the main new ingredient of Spectre seems to be the idea that userspace can poison the branch target buffer to cause speculative execution of arbitrary code in kernel space. That part of the attack should be fairly easy to mitigate with new hardware, by XORing (or hashing) the index into the BTB with a configurable value that depends on the privilege level. So each process has its own "nonce", and they're all different from the kernel's.

Then BTB poisoning won't work unless the attacker knows its own and the other context's nonce. Even if further attacks are found that leak this nonce, they could be mitigated by changing the nonce at regular intervals.

Couldn't you do something like have a separate chunk of "speculative cache" which you only commit to the main cache once the speculatively-executed instructions are retired? Sounds complex, sure - but it seems like that would give you the performance benefits of speculative execution while still being able to roll back (or prevent in the first place) any cache-state side effects when branches were mispredicted. Could also imagine processors start segregating cache by privilege level.

I guess part of the question you're raising is: are there so many different caches, translation buffers, etc. in a modern CPU that keeping 'uncommitted buffers' for the state of all of them would be just as complex as throwing a whole other core in there?

No, that would not be enough. CPUs speculatively execute across multiple branches. Even if you had a separate speculative cache for every code path, you could still build a side-channel from the amount of contention. [1]

[1] https://eprint.iacr.org/2016/613.pdf

> Both hardware thread systems (SMT and TMT) expose contention within the execution core. In SMT, the threads effectively compete in real time for access to functional units, the L1 cache, and speculation resources (such as the BTB). This is similar to the real-time sharing that occurs between separate cores, but includes all levels of the architecture. [...] SMT has been exploited in known attacks (Sections 4.2.1 and 4.3.1)

Possibly - but that's an entirely new processor design. That would take years to get released an adopted.

The scary thing is that you can't fix this in software.

It effectively means wiping the caches, TLBs, BTBs and any other caches and optimisations on any form of context switch, as far as I can see? Which yes will likely require new silicon.
> computer security Nobel prize

While they're not as big of a deal AFAIK, we do have the Pwnie Awards: https://pwnies.com/

We should have an annual vulnerability/amelioration award (the Cerberus?) and give one to those guys.