Hacker News new | ask | show | jobs
by Iolaum 1731 days ago
podman user here, because of the ability to run in rootless mode.

Using it on RPi4 with Fedora IoT running linuxserver io containers.

Appreciate the systemd integration as well making the containers services that gracefully go down and come up when the pi gets rebooted without me needing to do anything.

2 comments

I was enamored with this feature too, but a comment here on HN[1] made me reconsider its use. Apparently rootless requires unprivileged user namespaces, which provides a different security context than most apps are expected to run in, and might be less thoroughly tested than you would think.

I still like the systemd integration and that it doesn't require a daemon too, and I still favor it over Docker.

1: https://news.ycombinator.com/item?id=28393949

Edit: Clarified that it's the unprivileged user namespaces feature specifically, not namespaces in general. Thanks for the feedback solarkraft.

I am generally conservative about these things, but recently the consensus appears to be that unprivileged user namespaces are stable enough for GA. If you work in a targeted environment (e.g., you are a reporter covering civil liberties), maybe you should wait a bit longer, but for average people the advantages of rootless containers probably exceed the advantages of keeping the unprivileged_userns_clone code out of your attack surface.

I would encourage most security-conscious users to enable it and migrate to recent podman over using Docker, assuming a sufficiently recent kernel. The latest batch of major Linux OS releases have all enabled kernel.unprivileged_userns_clone, so Red Hat, Canonical et al seem to agree.

For those interested, though, you can read the anatomy of a userns clone() vulnerability here:

https://lwn.net/Articles/543273/

Thanks for linking the original comment. Every container uses namespaces, this is (TL;DR attempt) about user namespaces, which is a not particularly well audited kernel feature, meaning you may open up potentially insecure kernel code to unprivileged users.

The way I understand it, with containers running under a root user, is that to break out of a container you‘d have to find a vulnerability in standard (rootful) namespaces, which is much less likely (since it’s the same thing everything including Docker uses).

But the vulnerability without usernamespaces is much more severe, because the process then has UID 0 on the host.

Frankly if you are that concerned about security (e.g. you have multitenant workloads or are dealing with sensitive data), you should be using KVM or gvisor.

If you're using user namespaces and allowing a previous root only API to be used by a user, it's possible that some escape in that which provides root access (which wouldn't be as huge a deal if it was only ever able to be run as root) allows root level escalation outside the container.

That intermixes with security concerns about what's possible if running as root directly in different ways, and be more or less problematic than a root container depending on the use case, and also more or less likely based on how well those APIs are exercised for the specific use case.

It's not that these should be avoided, it's just that people should be aware that it's not necessarily a pure security increase at the expense of a bit of extra CPU due to kernel checks. There's a bit to consider. Maybe later everyone will consider this tested enough that's it's mostly a pure win. Maybe it's already at that point but people haven't internalized it. I don't know enough to know what stage we're at, but I thought it was worth mentioning, as it took me by surprise when I learned of it.

"the vulnerability" means one specific vulnerability in docker or somesuch "privileged container" i presume. there are also some sleeping in the kernel code that userns opens up, and even outright intentionally allow, that were previously not on the radar.

ultimately we must consider userns vs privileged-ns a fork in the road. one direction sweeps privilege concerns under the rug, and opens up new attack surface today leaving the door open for more non-obvious problems tomorrow. the other relies on highly competent engineers that know the nuances of the system they are working with, and have strong will to stomp out needless complexity from design to implementation.

It’s great that podman serves a good role for you, and I’m not going to argue that. My points:

1. Docker containers absolutely can be run without root. Yes, it’s not the default policy, but containers can have a user ID. If you are referencing the daemon-less root-less nature of podman, that’s a clear advantage of podman vs Docker. 2. Docker containers also have a restart policy which I use to also have them startup on machine reboot. By graceful, you must mean sending SIGTERM to the containers which Docker does as well.

Perhaps podman does these things better, but I want to point out that Docker does have many features for better or for worse.

When people talk about “rootless” in this context, they’re not talking about “launching containers whose process runs as non-root”. The innovation of podman is that a non-root user can “safely” be given permission to launch containers whose maximum permissions are “the perms that user has”.

Docker doesn’t have this functionality: the daemon runs as root, and anybody who is granted access to launch containers by invoking Docker commands can inherently access root-level privileges. The most mundane way to do this is to launch a container with the host’s namespaces instead of generating new ones.

It's not the default setup and not trivial to do, but Docker has had the ability to run the daemon as a non-root user for a few years. The standard .deb and .rpm packages even include scripts to automate the transition for you on Debian and RedHat Linux variants. See https://docs.docker.com/engine/security/rootless/

The only thing podman gives you that docker itself can't is running without a daemon at all.

> root-less nature of podman

I see this repeated a lot, but it's not the default, its has to be explicitly configured: https://github.com/containers/podman/blob/v3.3.1/docs/tutori...

And in addition to the known upsides, there are some lesser known downsides:

1. There are feature limitations with it: https://github.com/containers/podman/blob/v3.3.1/rootless.md

2. There are security implications, quoting Arch Wiki:

> Warning: Rootless Podman relies on the unprivileged user namespace usage (CONFIG_USER_NS_UNPRIVILEGED) which has some serious security implications, see Security#Sandboxing applications for details.

Also worth noting that Docker itself has a rootless mode as well by now: https://docs.docker.com/engine/security/rootless/

I'm happy that there are Docker alternatives, but I have the feeling that podman has been hyped a lot recently and many articles and comments give the impression that it's more secure by default and without any downsides.

Why should I trust the Arch wiki? People like Christian Brauner think the value of not running as UID 0 outweighs the increased attack surface from the user namespace.

https://people.kernel.org/brauner/runtimes-and-the-curse-of-...

Thanks for writing this. He knows what he is talking about. One of the LXC maintainers.
AFAIK, the default packaging on Fedora enables rootless podman without additional configuration.
Probably worth pointing out that docker has had a "userns-remap" option for quite a while, which causes all containers to run in a separate user namespace where UID 0 inside remaps to something else outside, so theoretically a user with access to the docker daemon isn't able to view the outside filesystem as root[0].

I have gone back and forth with podman. At some point it seemed to sometimes get into a funny state where I would simply delete everything[1] to fix it. On all systems where I run docker, I make sure to have "userns-remap": "default" in /etc/docker/daemon.json. Haven't looked into the rootless mode yet, but I was aware a few years ago that they were working on it.

[0] Without remapping root inside a different namespace, anyone with access to the docker daemon can access the outer root filesystem as root using a command such as `docker run --rm -it -v /:/oops alpine`

[1] Amusingly, the simplest way to do this was by using root to run `rm -rf` on my own ~/.local/share/containers/ directory, since the containers used UIDs other than my own (ones that are part of my subuid range)

Thanks