Hacker News new | ask | show | jobs
by rayiner 3083 days ago
Is glorious the right word for it? We’re going back to the stone ages where processors couldn’t predict the targets of indirect jumps. More generally, this seems to me like an attempt to patch out of what is really a class of attacks leveraging fundamental assumptions about high-performance CPU design. Before, OOO just had to preserve correctness and (some of) the order of exceptions and memory operations. Now, it has to preserve (some of) the timing of in-order execution too? Where does this path end?
7 comments

Legitimate question: on any non-shared non-virtualized system is there any reason to enable these workarounds besides running sandboxed applications such as javascript in a web browser (or flash/java applets/Active X, but those are not really super popular nowadays)?

For any other non-sanboxed application you pretty much have to trust the code anyway. Privilege escalation is always a bad thing of course, but for single user desktop machines getting user shell access as an attacker means that you can do pretty much anything you want.

As far as I can see the only surface of attack for my current machine would be a website running untrusted JS. For all other applications running on my machine if one of them is actually hostile them I'm already screwed.

Frankly I'm more annoyed at the ridiculous over-engineering of the Web than at CPU vendors. Because in 2017 you need to enable a turing complete language interpreter in your browser in order to display text and pictures on many (most?) websites.

Gopher should've won.

This unfortunately also affects almost all mobile apps and modern Windows installations, as they all run Javascript-enabled ads. Maybe this might cause Microsoft to reconsider what it allows to run on Windows but I don't see mobile ads going away any time soon.
>javascript-enabled ads

Good opportunity to get rid of them.

Or, as a compromise, no third-party javascript. Google can easily code up the 100 most common javascript ad formats and let advertisers pick from a menu.
Why would you need a Turing-complete language to describe advertisements? Why can't they be static images? Or static HTMLs?
From the advertiser's perspective, it wouldn't be a Turing-complete language, since they would only have access to standardized templates. Such a system would probably have to be implemented in Javascript at the browser level though, unless you could do it all with CSS animations.
I'd rather just be rid of them altogether. No compromise.
Do they really need 100 formats?

1. Video ad that autoplays.

2. Punch the monkey.

3. ???

> on any non-shared non-virtualized system is there any reason to enable these workarounds

Does the non-shared non-virtualized system have any encryption keys in memory that you want to protect?

Do you use full-disk encryption or ssh to other machines or use a cryptocurrency wallet?

If one hostile application running on my machine isn't sandboxed then SSH local keys are pwned anyway. Might as well install a keylogger or just hijack ssh-agent directly. Full disk encryption keys might not be but the app will have access to any mounted and unlocked safe. Ditto for cryptocurrency wallets without a hardware token (and even with a hardware token if the app can get it to sign a bogus transaction).

I don't think this particular vulnerability significantly increases the surface of attack for any non-sandboxed application running on my computer. There are much easier and straightforward ways to get access to anything an attacker with shell access may want that don't involve dumping the kernel VM. So in my situation the only vector of attack I'm worried about is JS running in the browser since I gave up on javascript whitelisting long ago when I realized that most of the web is unusable when you don't allow heaps of untrusted scripts to run all over the place. I don't have time to audit the source code of every random website I visit.

These questions are only relevant if you're not controlling and trusting all the code you're running that system. For a consumer system this is true if (and basically only if) you're running a web browser on that system.

If you're confident in the software you're running on a non-shared hardware, both Meltdown and Spectre are non-issues requiring no mitigation. This is a narrow class of systems, but it exists.

> For a consumer system this is true if (and basically only if) you're running a web browser on that system.

… which is pretty close to universally true, especially when you consider how many people use apps which are based on something like Electron. If those apps load code or, especially, display ads there's JavaScript running in places people aren't used to thinking about.

> If you're confident in the software you're running on a non-shared hardware...

and that you won't be hit by a remote execution vulnerability.

That's what being confident in the software means.
As the owner of gopher://jdebp.info/1/ and the author of gopher://jdebp.info/h/Softwares/djbwares/guide/gopherd.html , I disagree. GOPHER needs a lot of improvement merely to learn the lessons that people learned with FidoNet in the 1980s.
Followup legitimate question: the only way to read data is to control the results of a speculative execution or fetch right?

For JavaScript won't it be sufficient to check all the calls out of it so that they can't pass data that controls an exploitable speculative execution, and also generate JIT code so the JS itself can't create exploitable instructions. The API will have to be heavily scrutinized and the JS will run somewhat slower.

If the rest of the browser code is vulnerable, but the JS code can't control the speculative execution then it should be safe to run any JS.

Javascript is theoretically fixable -- what's needed is less fine-grained timing capabilities. I think it would be very hard to completely eliminate the possibility of controlling speculative execution as long as you can predictably invalidate the CPU branch predictor, which can be done even at a high level with if statements. Unless you get rid of the notion of contiguous arrays (making predictable word-aligned cache manipulation nearly impossible) and potentially remove the JIT completely in favor of an interpreter, it's hard to completely be rid of this class of an attack when executing any sort of code on the processor. Those steps are probably possible, but the JS performance hit that ensues would be the death knell of any browser.

That said, without some way of extracting timing at the granularity of 10s of instructions, this attack is moot. So that's likely going to be the mitigation. Unfortunately, the web frames used in some apps are infrequently if ever updated, so JS engine updates there are gonna be hard.

Why does it matter if they can cause the CPU's predictors to guess incorrectly if they can't control the target address of the branch or memory access?

Example: "if (a < length) return data[a]". If "a" comes directly from JavaScript then they trick the CPU into fetching data[a] even if it's invalid speculation and thrown out. But if there's a safe barrier between "if (a < length) { prevent_speculative_execution; return data[a]}" then they cannot learn anything.

I concede that safely checking all data coming from JS code to the browser would be a huge task, but pretty sure it would work to fix the problem for JavaScript although not in general, between processes with shared IO pages and such.

Honestly, you probably don't even need the barrier in your example. Getting data[a] into the cache is no information leak if the attacker already knows a. That's why the example in the Spectre paper uses an additional level of indirection.
Yes thanks to HN being so quick to freeze comments I was unable to fix the example.

Point is, a JavaScript program in isolation cannot read anything, it has to interact with the other target code somehow. If that interaction (the data passed over the API call) can't fail after a certain point and can't be used to read data before that point, then the JS can't read anything.

> Where does this path end?

It ends with the performance advantages of OOO execution being effectively negated by the workarounds to address the security issues it causes.

The following parable is edifying: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD05xx/E...

Seems like the ultimate end-game here is to have mini-vms for every process using CPU-level ring protection. If you can't speculate across privilege levels, only inside them, it isn't a security problem anymore.
Or time to have Kernel live on dedicated cache not ever accessed/shared with anything else. Let the CPU speculate all it wants, just not when playing in the kernel's cache. It may even be time for dedicated kernel cpus/cores.
Reading Kernel memory (Meltdown attack) is extra bad but regular user processes being able to read each other's memory (Spectre attack) is also very bad and not solvable by isolating the kernel.
Im less worried about my steam client reading my chat cache than something inside my web browser reading the keys that encrypt my home directory. Short of abandoning all sharing, the least we can do is isolate kernel cache.
That depends on what you are chatting with. My chatlog would be very interesting to our competitors. The key that encrypts my home directory isn't useful because the firewall blocks your access to my home directory (that a different layer of security).
Why one solution is put secrets in kernel and use meltdown mitigation to protect.
> It may even be time for dedicated kernel cpus/cores.

Oh yes, I agree! One needs to be able to phycically (un)lock the "kernel fpga" like a door without remote capabilities, except for server cpu's. Or whatever chip designers believe is a good "physical kernel embodiment" other than fpga.

EDIT: I know it's not really clever, but I would really enjoy hearing any solutions that doesn't try to fix it at the hardware level.

Qubes OS [1] does something like that

[1]: https://www.qubes-os.org/

Yet Meltdown nuked exactly that.
> mini VMs for every process using CPU ring protection

Yes. We should really start to learn from history, MULTICS operating system had already 16 CPU ring support back in the early 1970s. MULTICS is the mother of UNIX, its smaller child. MULTICS had so many advanced features that barely got implemented (often reinvented) in newer OS. It's time to read old docs and ask the old devs who are still alive. (Another such often overlooked gem is Plan9, but it's better known thanks to Go lang devs).

Older Intel CPUs only supported 2 rings. Modern Intel CPU supports only 4 rings. Windows and Linux use ring 0 for kernel mode and ring 3 for user mode. And Intel introduced a ring -1 for VT.

  "To assist virtualization, VT and Pacifica insert a new 
  privilege level beneath Ring 0. Both add nine new machine 
  code instructions that only work at "Ring -1," intended to 
  be used by the hypervisor
It's time for modern operating systems to use more rings, and modern CPUs to correctly protect between different rings.

https://en.wikipedia.org/wiki/Multics

https://en.wikipedia.org/wiki/Protection_ring

We have different utility functions, you and I.
tptacek exploits computers for a living, so it's glorious for him :)
It's not that it makes the practice of breaking into computers that much more interesting so much as it makes the underlying field much more interesting to work in. The engineering problems just got a lot more complex. We're all taking an attack vector seriously --- microarchitectural side channels --- that we weren't taking as seriously before, except as an abstract threat to crypto and a way of defeating a mitigation --- KASLR --- that nobody believed in anyways.

What's glorious is that serious software security people now have to start being literate about what it means to reverse engineer and dump the branch history buffers on different CPUs. Getting dragged through this kind of minutiae is the reason I'm still in this field after 22 years.

And I'm just a bystander here. Imagine what it must have been like for Jann Horn over the last several months!

This subsection describes how we reverse-engineered the internals of the Haswell branch predictor. Some of this is written down from memory, since we didn't keep a detailed record of what we were doing.

... because shit was so crazy while they were working this out that they didn't have the cycles to write everything down!

I'd be surprised if other Googlers like Christian Ludloff (also of sandpile.org fame) and Dean G were not involved. They know x86 better than many engineers at Intel/AMD, having been in charge of the most performance/cost critical code (e.g. tuning search serving down to the last cycle), as well as qualifying new platforms and identifying a steady stream of CPU bugs.
As someone specializing on a different field, this comment reads to me as if you were an astronomer being excited about discovering a gigantic meteor heading towards the Earth :)
Sure, if your subfield of astronomy was all about practical ways to adapt to living on meteor fragments!
Some of us would be terrified, but also excited about the prospect of studying a supermassive black hole from the inside. You know what I mean? Some things are just so singular and incredible, and so to the heart of a given field of interest, that “glorious” is a perfect word to describe it. An alien invasion of Earth would be a nightmare, but for some it would be the moment they could finally move past the realm of the hypothetical.

Passion is passion, even when it’s terminal.

> they didn't have the cycles to write everything down!

hahaha that was a good pun! Do you have a link to Jann Horns personal blog or Github? I've not never heard of him before.

Isn't that a bit like a firefighter saying your house burning down is "glorious"?
How many firefighters do you know? I guarantee you every one of them gets excited at the prospect of a "good" structure fire.
Maybe more like a meteorologist excited by a really bad storm.
The result might not be glorious but the fire is amazing to watch.
To the extent that he exploits computers for a living (e.g. pentesting) and not stops exploits for a living, it seems more like a (presumably law-abiding) arsonist calling houses burning glorious.
Arsonists set fires. Vendors create vulnerabilities, not pentesters.
Ooooh, I dunno, I might argue that vendors have their role in creating pentesters. ;-)
I like to think of a lot of vulnerability discovery and research as solving a puzzle. In the sense that this puzzle has so many far reaching implications makes it totally compelling to me. tqbf says "glorious", and I couldn't disagree.

[Edit] Or, how far down does the rabbit hole go?

Additionally, it is quite fascinating to me to compare the complexity of modern CPRUs with, say, a compiler.

> leveraging fundamental assumptions about high-performance CPU design.

I believe the generalized fix is to restore the entire CPU state after a mispredict. You’d either need to add an extra copy of the entire processor state (tens of megabits) for every simultaneous predict you support ($$$) or keep track of how to revert all changes and revert them one at a time ($, slow).

This is harder than it seems, because once cache is deleted you can't just un-delete it, you'd have to go back to memory and pull it again.

Only the "extra copy of processor state" thing is really viable. You have to have a speculative cache and buffer in reads that only get flushed to the main cache once they're confirmed to be valid, which is enormously complicated. This facility already exists for writes, but now it needs to exist for reads too.

GP is absolutely correct that this is a fundamental assault on processor design as we know it, the speculative execution concept is going back to the drawing board for a major re-think.

I can’t help wondering what igodard’s day is like so far...
"I'm walking on sunshine..."
Wasn't Intel's transactional CPU Memory the solution, but it also failed to to bugs?

Sorry for quoting wikipedia, but I'm not at school, hah! [1]

'''' TSX provides two software interfaces for designating code regions for transactional execution. Hardware Lock Elision (HLE) is an instruction prefix-based interface designed to be backward compatible with processors without TSX support. Restricted Transactional Memory (RTM) is a new instruction set interface that provides greater flexibility for programmers.[13]

TSX enables optimistic execution of transactional code regions. The hardware monitors multiple threads for conflicting memory accesses, while aborting and rolling back transactions that cannot be successfully completed. Mechanisms are provided for software to detect and handle failed transactions.[13]

In other words, lock elision through transactional execution uses memory transactions as a fast path where possible, while the slow (fallback) path is still a normal lock. ''''

[1] https://en.wikipedia.org/wiki/Transactional_Synchronization_...

Wasn't Intel's transactional CPU Memory the solution, but it also failed to to bugs?