Hacker News new | ask | show | jobs
by dandarie 1842 days ago
> The problem with this approach is that is not portable. What if I am developing using more than one computers where in each computer my user has different ID?

Make the build script use local $USERID and $GROUPID as args during the build process.

In docker-compose.yml (or, if using docker directly, using --build-arg):

    build:
      context: ./build
      args:
        USERID: ${USERID}
        GROUPID: ${GROUPID}

So you're passing the local uid and gid as variables to the build process.(1)

In build/Dockerfile:

  FROM image:tag
  WORKDIR "/application"
  ARG USERID
  ARG GROUPID

  RUN if [ ${USERID:-0} -ne 0 ] && [ ${GROUPID:-0} -ne 0 ]; then userdel -f www-data ;fi \
    && if getent group ${GROUPID} ; then groupdel www-data; fi \
    && groupadd -g ${GROUPID} www-data && useradd -m -l -u ${USERID} -g www-data www-data -s /bin/bash \
(1) $USERID and $USERID might not be available as an environment variable on your system. To do so, place this under .bashrc:

  export USERID=$(id -u)
  export GROUPID=$(id -g)
5 comments

But that doesn't solve the problem, just works around it:

1. Images are still pre-baked with a given UID/GID pair, so you can't distribute them as something universal and reusable.

2. This requires workarounds / extra steps on a local workstation, so it doesn't work for everyone unless they follow a given project's unique quirks setup.

Shell/compose duct tape like this doesn't make for a great experience, this really should be solved by upstream projects themselves as it's an extremely common issue when attempting to use Docker.

It's a feature for a multi-tenant deployment if you use user remaps. Maybe you only allow specific tenant containers with tenant specific uid/gid.
1. Nope, they are not pre-baked. They are built at runtime from env vars on each machine. 2. One step, setting up two vars. They can be set by a build script. Lots of things have build scripts way more complicated than this.

The only tedious thing is you have to adapt this for every image type you run.

> The only tedious thing is you have to adapt this for every image type you run.

The tedious thing is that this escalates into complexity whenever you have to deal with K developers using M projects developed by N teams each using a different way to handle this:

Do I need to set USERID for project foo, or UID? Does it default to 1000 or the author's UID? Oh, someone has a problem with our project, did they remember to set COMPANY_USERID in their bashrc? Oh, wait, they're using zsh, how do you do that there? Oh, but they followed this other project's readme and that set COMPANY_USERID but not COMPANY_GROUPID...

Docker is supposed to simplify this by unification and a limited API surface, and applying hacks like this on top kind of kills that whole premise.

> Do I need to set USERID for project foo, or UID? Does it default to 1000 or the author's UID? Oh, someone has a problem with our project, did they remember to set COMPANY_USERID in their bashrc? Oh, wait, they're using zsh, how do you do that there? Oh, but they followed this other project's readme and that set COMPANY_USERID but not COMPANY_GROUPID...

You set it to the output of id -u and id -g. It's two lines. There are definitely lots of things more complex when dealing with docker than this.

You provide the team with a script containing those two lines and a docker-compose wrapper and you're set.

Of course it would have been better not to have to care about these things, but hey, at least you're not installing and configuring 4-5 services to bootstrap an application.

If you have to build it on each machine, I would not consider that easily/universally distributable. One of the key points of Docker is you can build once (in your CI or someone else's) and run it on any machine. I think that was GP's point.
Sure, great, let me just rebuild all my docker images on every single machine they run on thereby completely defeating the point of having images in the first place.
You start from a base image of your choice. You only build the user replacement part.

You run docker-compose build ONCE and you're set. On my machine, it takes five seconds.

Heck, you can even run docker-compose build everytime you start the application, it will use the cached build and take less than one second.

---

Correction: the docker-compose up -d takes care of the build process the first time it runs.

Literally, it takes more to complain about the issue than build the image ONCE.

And reproducibility goes straight out the window. And how do you even interop this with kubernetes?
The solution is for docker-compose (or plain docker).

I don't think the reproducibility is out. It's the same app, the same image, the same intended user, you just inject, once, the local user and group ids.

but that _requires_ you to build-at-runtime, which is sometimes not the best way to deploy a docker app. if you have one app that you want to run on many nodes, you'll want to set up a docker registry and have the nodes pull pre-built images.
Of course, but really only build once on every machine. The subsequent starts use the cached build, even after reboot.

In fact, docker-compose up -d takes care of the build thing by itself. It's a five second tradeoff for the lifetime of the application.

For anyone that uses immutable infrastructure where servers’ configuration is never once built and subsequent deployments result in replacement with entirely new VMs, building once per machine still happens every time there is a deployment. You don’t ever reboot these machines.

In environments where vulnerability scanning of docker images used is important, running anything in production that isn’t stored in a docker registry kind of breaks things.

This approach also won’t work with container orchestrators like Kubernetes, ECS, Lambda, CloudRun, etc.

Where I can see doing a docker build of a small layer that just sets file perms potentially being useful is for container based dev environments to be ran on laptops and workstations.

This has been a major Docker pain point, and not many people know about this trick. I didn't know you could have the variables in the Compose file directly, does that really work?

Our approach so far was to add yet another layer (a script to pass uid/gid to Compose), but if we don't need the script that would be fantastic.

EDIT: Ah, I just saw the bashrc wrinkle you mention. Yeah, that's why we had the script, and it's a damn shame Docker can't do this natively. It has been a major hassle.

> I didn't know you could have the variables in the Compose file directly, does that really work?

Yep, it's because the build args get read in from a .env file by default and then from there Docker Compose sends those build args to Docker when it builds the image.

This was one of the topics from my talk at DockerCon last week (creating a production ready Docker Compose set up). The video and 6,000 word blog post for it will be coming out tomorrow. Both things will be added to the talk's reference links at https://github.com/nickjj/dockercon21-docker-best-practices.

That's interesting, thanks! My shell sets the USER variable (but no USERID or GROUPID), which might be good enough for all our developers, but probably not reliable enough for a general audience.
Honestly in practice everything tends to work fine without any hacks or extra scripts.

I run all of my containers as a non-root user and create the user in the image with its default values of 1000:1000 for the uid:gid. I haven't bothered to expose the uid:gid as build arguments because it's pretty much never an issue in development or production.

With a uid:gid of 1000:1000 built into the image any bind mounted files end up being correctly owned by the Docker host's user under the following conditions:

- Docker Desktop on macOS

- Docker Desktop on Windows using WSL 1

- Docker Desktop on Windows using WSL 2 and native Linux (as long as your dev box's user is set to 1000:1000)

IMO it's really rare that your dev box's user wouldn't be 1000:1000 on native Linux or WSL 2.

In production you also have full control over the uid:gid of your deploy user.

The only time where it kind of stinks is CI, but it's super easy to get around this by simply not using volumes in CI.

I have a bunch of examples of this pattern at:

    - https://github.com/nickjj/docker-flask-example
    - https://github.com/nickjj/docker-django-example
    - https://github.com/nickjj/docker-rails-example
    - https://github.com/nickjj/docker-phoenix-example
    - https://github.com/nickjj/docker-node-example
    - https://github.com/oleksandra-holovina/docker-play-example
> IMO it's really rare that your dev box's user wouldn't be 1000:1000 on native Linux or WSL 2.

Any company-wide (GNU/)Linux deployment that uses LDAP or some other centralized user directory will not have devs with UID/GID 1000:1000. Hope is not a strategy.

> Any company-wide (GNU/)Linux deployment that uses LDAP...

You can go the extra mile and turn the UID:GID into build args like the original parent and you're good to go. No hacks necessary, and since it's all self contained into a .env file there's nothing extra you need to run since you're likely using an .env file already for other vars.

Alternatively you could do this: https://news.ycombinator.com/item?id=27344491

In either case you can solve the problem without too much effort.

> IMO it's really rare that your dev box's user wouldn't be 1000:1000 on native Linux or WSL 2.

Any major company using LDAP/AD or other forms of centralized user management won't be able to make that guarantee.

> In production you also have full control over the uid:gid of your deploy user.

If you're running in an un-managed environment, yes - managed hosting of any kind generally doesn't provide these guarantees.

Hm, you're right, I guess I've seen a non-1000 user very rarely. However, for a company of tens to hundreds of people where you want them to be able to develop locally, you might very well hit this issue, and if you hardcode 1000 it's going to be hard for them to work around it.

This method works well until it doesn't work at all, and I think I would prefer one that works slightly less well but also had an easier way to override it. Then again, I might try this and see if we ever hit an issue, thanks!

IIRC on Arch, unless you create your own group, you're part of the users group, with GID 100
maybe i did something weird last time i installed ubuntu, but my user is 1001:1002 and the default ubuntu user is 1000:1001
Within docker-compose.yml I use

  services:
    foo:
      image: foo/bar:6.9
      user: ${UID:-1000}:${UID:-1000}
On Linux with Bash it runs with your current user and most other platforms it runs with id 1000, which is setup as the default user in the Dockerfile. This is no problem on MacOS or Windows because of the way Docker-Desktop uses VM's.

ZSH or other shells don't necessarily set $UID, so if you're running Linux, not id 1000 and not running Bash you might need a little .env file with `UID=1001` in it to make it work. And then the user is still nameless in the container. This is kind of rare and I only use it for dev containers where most relevant files (and permissions) are bind-mounted from the host, so it hasn't really been a problem in practice.

Remaps would be cleaner but I find it too much work to explain for normal developers just wanting to use a dev container.

From my experience, UID is not always available as to docker-compose.yml because it isn't exported (at least in bash).

See more here: https://stackoverflow.com/a/50900530/15428104

$ declare -p UID declare -ir UID="1000"

The -x option is missing.

This is excellent, thank you.
Containers are ideally meant for a single service. The best way I've found is to just pass the `--user` flag to `docker run` and have the service run as whatever user it is that you want. The only challenge is that you need to make sure that the volume mounts are already created on the host with the correct permissions.
That runs the container as a given usee, but doesn't prevent the container running some processes as a different internal user.
If you built the container or inspected it before running you should know what the container is doing. Again, containers like Docker aren't really "meant" to run multiple processes. They are meant to run a single process and your app should be able to run as whatever user you run the container with. If you want to run multiple processes or services inside a single container then ultimately you're better off with a different container solution.
> more than one computers where in each computer my user has different ID

Decades of network filesystem users have had many solutions to that.

I can think of basically two solutions:

1) pass user/group names around and resolve them at the destination to UID/GID; 2) ignore them entirely; assign ownership of all newly created files to the currently authenticated user (if authorized).

Are there other ones?

3) treat a machine-id/user-id pair as the “real userid” 4) add a remote->local userid mapping feature to your filesystem.