|
|
|
|
|
by TrueDuality
1535 days ago
|
|
The reason that containers are not generally considered a security boundary is that many of the namespace primitives were _not designed_ as a security layer, they aren't designed to actively reduce the privileges from the current user's context. Since most containers are started as the root user, the namespace transition inherits root's permissions even if they're later dropped. Without SELinux or seccomp restrictions, root can still pretty much do anything to the host even inside the containers. For the most part this is troublesome when parts of the kernel or host userspace code are not fully aware of the different forms of namespacing (there are still portions that just check for an effective UID of 0, without checking whether they're in a namespace for example). These are the components where a lot of container breakouts happen and is largely mitigated by having internal processes in the container not running as root in the namespace. Dropping privileges to a different user still trace's it origin back to the root user on the host, so in some cases being partially aware of namespaces in a section of the kernel or host user code actively hurts the security by tracing the user back to root and using those privileges again. SELinux really tightens the potential to pull these shenanigans, but most production k8s clusters at least that I've seen are built on Ubuntu where those protections aren't available. In this case the security layer is once again SELinux not the namespacing. As long as the container runtime is performing the various namespace isolation primitives starting from the root user these container bypasses are going to be a risk. There are 'rootless' versions of containers which can only use the privileges available to lower (presumably heavily restricted) user but those aren't widely used. Once again this is relying on the security protections of the host user authorization, not on the namespaces. The networking analogy is NAT. People treat it like a security layer as it kind-of-sort-of looks like an ingress firewall since you can't directly address devices inside a NAT, but its not and can be pierced pretty easily. NAT is not a firewall. Namespaces are not a security layer. |
|
That's not true
> For the most part this is troublesome when parts of the kernel or host userspace code are not fully aware of the different forms of namespacing (there are still portions that just check for an effective UID of 0, without checking whether they're in a namespace for example).Yes, like I said:
> But an attacker can escape by exploiting the kernel, which I think most security people would consider to be not particularly high effort.
> Dropping privileges to a different user still trace's it origin back to the root user on the host
It does not. Only if the process creating the container is root, which with unprivileged user namespaces is not (necessarily) the case.
> The NS_GET_OWNER_UID ioctl(2) operation can be used to discover the user ID of the owner of the namespace; see ioctl_ns(2).
"root" isn't the point anyways, it's about checking capabilities. The problem is that the Linux kernel has historically not cared about root -> kernel privesc, and containers expose more attack surface because of that. But an attacker outside of a container can still just enter a namespace (user namespaces are unprivileged) and perform the same exact privesc, so containers aren't making anything worse.
> As long as the container runtime is performing the various namespace isolation primitives starting from the root user these container bypasses are going to be a risk. There are 'rootless' versions of containers which can only use the privileges available to lower (presumably heavily restricted) user but those aren't widely used.
That's not how namespaces work. Even with 'rootless' containers your guest has CAP_SYS_ADMIN. The only difference is that the daemon that starts the container isn't privileged because user namespaces are increasingly becoming unprivileged. Rootless changes nothing, except that attacks against the daemon itself won't be an insta-privesc to root on the host, they'll only be a privesc to the user running the daemon on the host.
Anyway, let's step back.
What is a security boundary? I would say it is a mechanism by which an attacker is restricted where the attacker must exploit a vulnerability in order to get around that restriction. By that measure, containers are a boundary. Is exploitation difficult? Not necessarily, like I said, the Linux kernel has loads of attack surface. But it meets a reasonable criteria for a boundary.
As an example, chroot on its own is not a boundary because attackers can just call chroot again - this requires no vulnerability, it will never be patched, and you need another layer to prevent that. Containers have nothing like that, there is no "just let me out" syscall, you require another vulnerability.
You can read more about user namespaces here:
https://www.man7.org/linux/man-pages/man7/user_namespaces.7....