Hacker News new | ask | show | jobs
by plasticxme 1995 days ago
Nginx and the like are starting to provide non-privileged versions of their container images.

Running as root is lazy and equals container escape, especially when running on anything other than scratch and read only file system.

The only reason Nginx and Traefik run as root is to bind to privileged ports (80,443). There is no reason to do that inside of a container, since you can remap exposed ports outside of the container.

Containers are not VMs and must be handled differently. You are always one RCE away from having your entire container platform compromised.

1 comments

You don't need root to open ports 80 and 443 but instead use CAP_NET_BIND_SERVICE that you can also grant to the container.
CAP_NET_BIND_SERVICE is a root privilege, a distinct one provided by kernel capabilities, granted to a process. In order to use it the container must be permitted to allow its processes to elevate their privileges.

If the container is running as root permitting it is redundant, since the kernel doesn’t filter root for kernel capabilities anyways.

If a privileged user sets CAP_NET_BIND_SERVICE on an executable binary using setpcap to allow a non-root user the ability in a container to bind to a privileged port, elevated privileges are still required for execve to create a process that is permitted to use the kernel capability. Think sudo but for processes.

The argument with containers is that binding to a privileged port isn’t necessary, so you shouldn’t do it. And by not doing it you improve your security posture.

Don't even need that on newer kernels and Docker 20.10: https://github.com/moby/moby/pull/41030