| > This is not my area of expertise but this is omitting that user namespaces tend to drastically increase the attack surface (despite what some vendors say). Configuring user namespaces for the container to improve containment = very good idea. Enabling CLONE_NEWUSER inside a container = (usually) a very bad idea. You can do one without the other, and the built-in user namespaces support in Docker (and Podman) does exactly that. As one of the runc maintainers, I can say without reservation that user namespaces would have blocked the vast majority of container breakout attacks in the past decade and you absolutely should use them. The only technology with a similar track record for improving container security is seccomp. (SELinux folks will argue that SELinux deserves mention or maybe even top billing, but I have somewhat mixed opinions on that.) This is not even an unusual opinion. LXC doesn't even consider containers with user namespaces disabled part of their threat model, precisely because it's so insecure to not use them[1]. Also, in my experience, most kernel developers generally assume (incorrectly) that most users use user namespaces when isolating containers and so make some security design decisions around that assumption. In every talk I've given on container security in the past few years I have urged people to use user namespaces. It is even better for each container to have its own uid/gid block. Podman, LXC and runc all support this but Docker doesn't really (though I think there was some work on this recently?). The main impediment to proper user namespaces support for most users was the lack of support for transparent uid/gid remapping of mount points but that is a solved problem now and has been for a few years (MOUNT_ATTR_IDMAP). [1]: https://linuxcontainers.org/lxc/security/ |