Hacker News new | ask | show | jobs
by tptacek 43 days ago
This is an interesting post from Cloudflare, as usual, but it's not clear to me why they would have been vulnerable to CopyFail. Did I miss the point in this blog where that's addressed? What triggered the threat hunting and mitigation exploit? At what points in their architecture were they reliant on Linux user-based access control?
3 comments

They weren't vulnerable to it in anything but an academic sense. They call that out up front: "There was no impact to the Cloudflare environment, no customer data was at risk, and no services were disrupted at any point."

This was probably written by their security team. Security teams are paranoid. They want everything patched everywhere all at once at a severity level zeo. Also, PR. Also, also, if through some lack of imagination, this was somehow involved in an exploit of their services, it would look really really bad. So, CYA.

Yeah I think what I'm trying to clarify here is: are they doing a threat hunting exercise out of concern for multitenant exposures, or out of concern for internal privilege escalation?

Cross-tenant would be very surprising! But I don't know enough about their architecture.

It's weird, right? The underlying CNE primitive here, for CopyFail, is not novel. These happen all the time. Why the announcement? Is it just because CopyFail got so much attention?

I can upload arbitrary code to Cloudflare workers, which they run on their systems. It's sandboxed, but in the big bad Internet, if you were Cloudflare, how much would you really trust that sandbox?
Let's say an attacker escapes the sandbox and gets a local non-root shell on the machine. At that point, how much more access does escaping to root gain the attacker? (This is a rhetorical question. Cloudflare doesn't say, which I think is the point of this line of questioning.)
Not actually knowing anything about their architecture, but if you somehow gained root on a Cloudflare worker box, the system that I'm sure they've design against this attack for, is for that attacker to then be able to steal the private keys for all the TLS traffic hitting that machine, and then exfiltrate all data going through it and also inject their own content to visitors.
Why are you sure of that? I wouldn't design a critical system that relied on the difference between root and non-root accounts to protect private keys. I would design a system assuming the attacker can trivially escalate to root privilege. Because historically you just cannot rely on the difference. LPE attacks simply happen on too regular a basis.
It's not running with direct access to Linux kernel system calls, is it?
The whole IT industry is reliant on Linux user-based access controls, it is not a Cloudflare thing.

Also leaving a massive gap like this behind would be a mistake on multiple levels. For example, it might get combined with another exploit that can achieve unprivileged access to some piece of metal, or you can have a disgruntled employee without admin access escalating their permissions on a box they aren't supposed to see all the secrets.

> For example, it might get combined with another exploit that can achieve unprivileged access ...

Yeah. TFA mentions datacenters in 330 cities. That's a lot of Linux boxen. And many of those have, by definition, ports opened to the big bad Internet. These Linux servers are running services. They answer to ping, for a start. I even heard some are running DNS servers. Remote local exploits are a thing.

What does CloudFlare prefer: that when the next remote local exploit surface all their fleet is one copy.fail away from privilege escalation to root or that they get the time (seen that they obviously have quite advanced detection measures in place) to detect the intruder before it gains root everywhere?

It's Linux. It's datacenters in 330 cities. Linux powers the world and that's how things works.

I, for one, I'm glad to own CloudFlare stocks since right after the 2022 crash and, for two, I'm happy they don't let their huge fleet of Linux servers with a non-patched exploit.

I'm not asking why they'd need to go threat-hunting if there was an ICMP kernel RCE in Linux. CopyFail requires someone untrusted running shell commands somewhere. Where is that exposure in their architecture?

I'm asking because I don't think they have such an exposure.

At the very least, Cloudflare hosts web workers, which let a customer execute more-or-less arbitrary wasm code on their servers. If there's an exploit that lets you escape the wasm sandbox, copy.fail can be chained into (afaiu) an exploit against the Linux host. That's a pretty big risk.

Also, Cloudflare hosts some AI services, so it's possible that some consumers are running Python code in their containers, without the wasm sandbox.

If there's a direct link from Cloudflare workers / WASM to uid=nobody execve or arbitrary syscalls on their hosts, they're already fucked, so I don't think that's true.
I don't understand your point.

You seem so pressed on the fact "why would they even patch this!!!", maybe because its best practice to patch things? You never known what things could be chained together, so you might as well patch this, given its so obviously bad.

I would assume it was about protecting their servers from internal sources escalating privileges vs. them providing publicly accessible Linux shells.
I mean, that's a real project, but Linux LPEs kind of grow on trees, so you can't literally rely on threat intelligence for this problem; presumably you handle it by drastically scoping down and surveilling what people do on prod hosts.