Hacker News new | ask | show | jobs
by longtermsec 2585 days ago
The idea that it's trivial to break out of any Docker style container just doesn't reflect reality.

-- not just being contrarian here, actually, the reality is that it might be trivial. and it was demonstratively trivial for a long time (see CVE-2019-5736)

As for contained.af -- its not a good indicator, it mostly indicates that the reward doesnt meet the market price for demonstrating an escape from a set of hardened namespaces (which is going to cost more than an escape from "any docker container").

1 comments

So the runc vuln, only applied if you were a) running as root in the container and b) hadn't enabled user namespacing. (Also for completeness, it didn't work on RHEL based distros that applied their standard SELinux policy (IIRC))

Also not specifically a Docker vulnerability, it was a runc issue which also affected other Linux containerization software (e.g. lxc)

But despite all that, that's just an example of what I was talking about, all software has vulns, including runc, including gvisor.

Stating that "containers don't contain" implies that it's not just a specific bug, but that architecturally the process is flawed (at least IMHO), which I would suggest is at the least an over-simplification.

as to contained.af, well if it was indeed "trivial" then surely not a large reward would be required :)

so a) and b) are common in practice. these were not obscure boundary conditions or a corner case. and it was very trivial to exploit.

"all software has vulns" is a slippery slope is my overarching point. you can't use that to say that the the security risks and isolation are comparable to gvisor. gvisor does away with a very significant amount of attack surface in the linux kernel and reimplements it in golang, which eliminates many bug classes.

for a realistic risk assessment you should consider the linux kernel as a bottomless barrel of memory management bugs, which are exploitable from within a container, whereas gvisor will have a much more finite set of bugs

On our team we've got extensive experience in finding compromises in this area, particularly in kernels, and that is why I am adamant that one should not think what docker provides meets the bar for best practices in a security critical environment. Something like gvisor would much more fit the bill.

The original point I was making what that dismissing container isolation with the trope "containers don't contain" is overly simplistic, not that I thought that docker/runc containers with a default profile had as small an attack surface as gVisor.

Generally the security of a piece of software isn't considered fundamentally flawed just because it has a security bug, otherwise pretty much every piece of software would be in that bucket by now. As such dismissing containers using that trope based on a bug which wasn't discovered when the trope was coined (by Dan Walsh IIRC) doesn't seem appropriate.

There have been (AFAICR) three breakouts that would affect a default Docker installation in the last 3-4 years (Dirty C0w, WaitID, and the runc issue). That doesn't feel like a particularly high incidence, and gVisor has had at least one in it's shorter lifespan...

If it's always trivial to breakout of docker/containerd/runc containers as (if I'm understanding you correctly) you appear to be implying and which is what appears to be implied by the trope, then I imagine people will be making good money from bug bounties for a long time as a lot of companies are creating platforms which execute semi or untrusted code in runc containers.

I'm not sure that it is overly simplistic, I think the statement that "containers do not contain" is an intentional oxymoron that points to some ground truths. These ground truths are that a process in a container is running in the same kernel, and although namespaces are meant to isolate some set of resources from other processes, and there are still very many shared resources that might not be isolated at all. This means a lot of attack surface, and exploiting the kernel will grant access to the other processes on the system.

In terms of quantity, 4 is not an accurate picture. I haven't sat down to analyze CVEs (https://www.cvedetails.com/product/47/Linux-Linux-Kernel.htm...), but say out of 50 practically exploitable kernel memory corruption bugs/year 4-5 new bugs every year are reachable from some common namespace configuration for a container. And this just marks what is publicly disclosed, which is a subset of the vulnerabilities attackers know about.

Bounties arent the only outlet for these, see: VEP.

So (as I'm sure you know) linux container isolation isn't just a product of namespaces, but namespaces+capabilities+cgroups+(SELinux/Apparmor)+seccomp-bpf. Each one of those layers provides some aspect of isolation and for a Linux kernel exploit to succeed in escaping a container it needs to bypass/compromise each one (or as in the case of the runc vulnerability occur prior to the sandbox being fully established).

So just taking Linux kernel bugs as a metric doesn't really apply.

That's why I gave the list I did, as those are the only ones which I'm aware of which can bypass all the layers of isolation in a standard Linux container.

If the ground truth "containers don't contain" applies, then it appears you're saying that Linux is innately and architecturally unsuitable for multi-user/process use, which seems like a fairly bold statement given its prevalence...

After all, all a container is, is a Linux process with Linux isolation mechanisms applied to it...

bingo. one should always assume that userland access on a linux box is a short step away from full system privileges and active exploits are ready for use by an attacker.

docker has started adding hardening with SELinux+Seccomp because theres a realisation that the linux kernel bugs keep coming, but this is just a bandaid. the other problem with this approach is that in practice a hardened config is too restrictive for real-user use and has real maintenance cost so most will never use them (as argued by others in this thread for why the gvisor approach is superior). AppArmor is very poorly maintained, buggy, and not a practical solution

If you do cybersecurity work and Zerodium bug bounties for your stack are less than your yearly wages, you are honor-bound to offer your resignation and request that the company use your salary towards bug bounties.

Fortunately zerodays aren't commonly used.