Hacker News new | ask | show | jobs
by qbasic_forever 1480 days ago
Rootless podman is my first choice for using containers now, it works fantastically well in my experience. It's so much nicer to have all my container related stuff like volumes, configs, the control socket, etc. in my home directory and standard user paths vs. scattered all over the system. Permission issues with bind mounts just totally disappear when you go rootless. It's so much easier and better than the root privileged daemon.

I really wish rootless podman/docker was the default install now. It's still kind of annoying to setup with reading a smattering of old docs and having to think about your distro setup, cgroups settings, etc. It really should just be a "run this install script and you're done".

7 comments

The problem with rootless is that you don't get a native network stack since setting up bridges and veth devices still requires some elevated capabilities. But instead of running full root this could be outsourced to a helper executable with some caps set (a narrower version of suid).

> Permission issues with bind mounts just totally disappear when you go rootless.

Recent kernel versions have gained uid mapping capabilities on mounts. Hopefully future docker will make use of it. Then we can run entire containers as different users.

TBH I don't want to expose the native network stack to containers
If your concern is kernel attack surface then I have bad news for you. Inside the container's network namespace it's still using standard syscalls. Only on the host side it takes a detour through userspace. So you get all the downsides, none of the native performance and very few upsides. It only benefits firewalls that still assume a machine has a single network interface without bridging/natting/forwarding.
Are you saying that all files from your containers are owned by you as user? If so I will start investigating right now. It is so super annoying to download something with nzbget for example and then having to go through sudo to get to your downloaded files. It is indeed my major gripe with my docker compose setup atm.

Or just messing with a html file in the nginx docker bind mount, ugh!

If podman solves that I’m going all in tomorrow.

> Are you saying that all files from your containers are owned by you as user? If so I will start investigating right now

You can do this with Docker today without much fuss.

Here's a bunch of web app examples (Flask, Rails, Django, Node, Phoenix) that run your containers as a non-root user which ensures any volume mounted files end up being set to your Docker host's user along with running your main process as a non-root user: https://github.com/nickjj?tab=repositories&q=docker-*-exampl...

There's no hard coding of user names either. The user name created in the Docker image never directly gets mapped back to your Docker host.

This works because bind mounts happen over uid:gid 1000:1000 by default, so as long as your Docker host user's uid:gid is 1000:1000 everything works out of the box. On Windows and macOS you don't need to even think about this because Docker Desktop will fix permissions for you and on native Linux chances are your user uid:gid is 1000:1000 because it's the first non-root user on your system. For non-controllable environments like CI you'd typically disable volumes which is a good idea anyways because you're probably not using volumes in production. For single server deploys on a self managed VPS you control the environment. I covered this in a little more detail in my DockerCon talk at: https://nickjanetakis.com/blog/best-practices-around-product...

In the worst case scenario where you have no other options you can make the Dockerfile more complicated and introduce build args for the uid:gid so you can change it to satisfy the needs of a specific host but I don't like this since you'd need to rebuild a different image for a different environment, but it would technically work. I've never run into this scenario after having used Docker since 2014. I've also done contract work for dozens of companies in all sorts of different environments.

That's fair, but that issue is more common than you think. Some folks use Linux desktop systems with multiple users: shared computers (family or university--not all lab environments have sane workstation user management, unfortunately), or a personal computer with multiple accounts for separation (e.g. a home and work user) both come to mind.

And sure, UID remapping is available, but that's no longer in the realm of "just works".

You can perform the step in my last paragraph to make it work in those cases. It would come down to introducing 2 new build args, making sure the user you create sets the uid:gid based on these values, defaulting to 1000:1000 so it works for most but allows you to override them by modifying 2 env variables in an .env file (you can configure docker-compose to set the build args with env vars).

If someone has a case where they are doing anything you described the above steps can be implemented in like 5 lines of code and 5 minutes. I didn't add it to my example apps because there's only been 1 or 2 requests for it over multiple years and I've never encountered it once, no one taking my Docker courses has ever hit a road block by it either.

The --uidmap and --gidmap options can map your regular user on the host to any specific user inside the container.

These options may look to be a bit complicated to use, but as soon as you understand how rootless Podman maps UIDs and GIDs it will be pretty straight forward.

I wrote two troubleshooting tips about how to use them:

https://github.com/containers/podman/blob/main/troubleshooti...

https://github.com/containers/podman/blob/main/troubleshooti...

Root inside the container is the same as your user.
No it’s not. File written from inside the container into a mounted volume as root will be owned by root outside the container (uid 0, to be specific; doesn’t matter what the user is named).

Edit: I might have misunderstood parent, who might be referring to Podman attempting to manage the uid mapping.

The parent comment is still talking about rootless podman (and really just user namespaces). Root in the container is absolutely mapped to the user executing podman outside the container.

If it mapped to root outside the container, you could just use podman to create setuid scripts owned by root for very trivial privelege escalation.

Yes I think you are right --- I was mistaken. Docker without the rootless operate in the way I described.
Last thing I remember you can tweak your /etc/setuid, setgid to properly map between the user inside the container and outside
Something that can't be solved by PUID/PGID in the command or in Compose?
The arch wiki is my go to every time I install Podman, and it’s a little easier every time. It’s down to like two steps now, with no file editing. We’ll get there.
If you are on Linux, there is the fantastic podman option "--userns keep-id" which will make sure the uid inside the container is the same as your current user uid.
> Permission issues with bind mounts just totally disappear when you go rootless.

I have a problem with mounting a named foo in a container (at /foo) and bindfsing the underlying directory of that volume on ${HOME}/foo with create-for parameters so that when the host user touch files in it they are owned by host 1000:1000 but inside the container it's owned by 33:33.

Volume foo really contains only a unix socket. This unix socket is shared between the host and the container for xdebug communication.

So, this doesn't work, the container process can't write/read the socket even though it can manipulate other files in the mounted volumes /foo and they appear as owned by 1000:1000 on the host and vice versa.

But if I mount the volume directly like that: ${HOME}/foo:/foo then it works and the container can write to the socket and the host and the container can communicate both ways.

Would rootless podman allow me to use a named volume ? Why doesn't it work like I think it should, is it because the unix socket lives in the kernel 'or something' ? Maybe it's a question for SO.

It probably is a question for SO, where it would be best described with a script that sets up the minimum environment required to reproduce and the situation that you want to achieve. This description doesn't quite get me to an understanding of the problem, but that may be a personal issue.
Change the ids of the user inside the container to match what's needed on the host or take a look at subuid's.
> Change the ids of the user inside the container to match what's needed on the host or take a look at subuid's.

Oh, I tried running the process in the container under a different uid but it complained about unprivileged user. Going the other way may do it, thanks. Although it requires more fiddling with stock images.

What about UID issues? I remember using it years ago and sometimes having permission issues in containers when mounting local files. How is that nowadays? I much prefer running this in a rootless manner also. What about docker compose? Is there an alternative for podman?
Yeah in my experience with rootless you don't need to worry about UID shenanigans anymore. Containers can do stuff as root (from their perspective at least) all they want but any files you bind mount into the container are still just owned/modified by your user account on the host system (not a root user bleeding through from the container).
How does that work in practice? Podman is changing the permission bits of files that are synced between the host and the container?

If I create a file with certain permission bits in the container, I'd expect the file to be 100% identical when pulling it over to the host, but maybe that's just "legacy" thinking coming from my docker experience?

What about copying files directly between containers, would that change the permissions as well?

The permissions (rwx) don't change, but the uid/gid is mapped. E.g. uid 0 is the running user outside the container, by uid 1 will be mapped to 100000 (configurable), and say 5000 inside the container is mapped to 105000. I don't remember the exact mapping but it works roughly like that.
Oh, that sounds great! Thank you, that was the information i wanted.
>If I create a file with certain permission bits in the container, I'd expect the file to be 100% identical when pulling it over to the host

The permission bits are metadata on the fs, the file can still be identical.

Plus, how permissions on a file on the container be identical in the host, if e.g. the groups/users are different?

There’s podman-compose which does what you want, but is a community maintained script.

There’s also the ability for podman to run as a system service, and provide an OCI compatible container API. This then integrates seamlessly with the actual docker-compose.

See: https://www.redhat.com/sysadmin/podman-docker-compose

You can point the official docker-compose at podman now!

I do that!

It's 99.999% compatible as the podman people basicaly reimplemented all the docker daemon APIs.

It sometimes lags a bit behind, because sometime docker implements new stuff... But for usage with docker-compose it has worked flawlessly for me.

EDIT: you can also export the podman unix socket via socat, i also tried it to run a rootless docker runtime in kubernetes (podman daemon running as a pod, to run docker builds in kubernetes) as an experiment. It works but i'd love to see a better integration with Gitlab runner project.

Gitlab is supposedly getting podman support any time soon, in 15.1 IIRC ?

I tried podman compose quite a few months ago with my docker compose file and it failed, what is the difference between doing that and this?

So like, you can use docker compose for podman instead of docker, instead of something like podman compose?

The difference is that instead of using podman-compose you use the actual docker-compose.

You have to point your $DOCKER_HOST to the podman unix socket or something, but other than that it’s the actual docker-compose experience. Via the env var trick you could even use the actual docker binary. But you could just alias docker to podman and it works the same, by design!

Wow that sounds amazing! Thanks for the information.
Nah no good alternative yet, I tried running my docker compose file which works perfectly with docker in podman compose and it failed outright.
I don't know. When I tried it I got a lot of networking issues with components even the existence of which was barely documented online.