Hacker News new | ask | show | jobs
by matheus-rr 112 days ago
The --mount=type=cache for package managers is genuinely transformative once you figure it out. Before that, every pip install or apt-get in a Dockerfile was either slow (no caching) or fragile (COPY requirements.txt early and pray the layer cache holds).

What nobody tells you is that the cache mount is local to the builder daemon. If you're running builds on ephemeral CI instances, those caches are gone every build and you're back to square one. The registry cache backend exists to solve this but it adds enough complexity that most teams give up and just eat the slow builds.

The other underrated BuildKit feature is the ssh mount. Being able to forward your SSH agent into a build step without baking keys into layers is the kind of thing that should have been in Docker from day one. The number of production images I've seen with SSH keys accidentally left in intermediate layers is genuinely concerning.

2 comments

There is something wrong with the industry in which we think that, when a production build requires SSH keys, the problem is that the keys might leak into the build artifact.
Keys leaking into the build artifact was never the concern.

It's about not having the private keys stored unknowingly in intermediate layers of a build container.

Those intermediate layers are usually part of the artifact. Try exporting an image with docker save and investigate what’s inside. This is all documented in a mostly comprehensible manner in the OCI specs.

I’m afraid you’re missing my point, though. A high quality build system takes fixed inputs and produces outputs that are, to the extent possible, only a function of the inputs. If there’s a separate process that downloads the inputs (and preferably makes sure they are bitwise identical to what is expected), fine, but that step should be strictly outside the inputs to the actual thing that produces the release artifact. Think of it as:

    artifact = build_process(inputs)

    inputs = fetch(credentials, cache, hashes, etc)
Or, even better perhaps:

    inputs = …
    assert hash(inputs) == expected
(And now, unless you accidentally hash your credentials into the expected hash, you can’t leak credentials into the output!)

Once you have commingled it so that it looks like:

    final output, intermediate layers = monolithic_mess(credentials, cache, etc)
Then you completely lose track of which parts are deterministic, what lives in the intermediate layers, where the credentials go, etc.

Docker build is not a good build system, and it strongly encourages users to do this the wrong way, and there are many, many things wrong with it, and only one of those things is that the intermediate layers that you might think of as a cache are also exposed as part of the output.

It was confusing of you to say build artifact to refer to the container itself in this context. Sure you're not wrong because the container is also a build artifact, but in context of CI, build artifacts is the output of running the build using the container.

Hence my confusion of what you meant -- no one's saying ssh keys are in the CI build artifacts. But obviously they can be in the container as layers if people do it wrong, which is bad.

We're talking about the same thing basically. Yes fully defining your inputs to the container by passing in the keys is a good solution.

I think there's a lot of confusing terminology in your comment.

> the container is also a build artifact

By "build artifact" I mean the data that is the output of the build and get distributed to other machines (or run locally perhaps). So a build artifact can be a tarball, an OCI image [0], etc. But calling a container a build artifact is really quite strange. A "container" is generally taken to mean the thing you might see in the output of 'docker container ls' or similar -- they're a whole pile of state including a filesystem, a bunch of volume mounts, and some running processes if they're not stopped. You don't distribute containers to other machines [1].

> in context of CI, the output of running the build using the container

I have no idea what you mean. What container? CI doesn't necessarily involve containers at all.

> no one's saying ssh keys are in the CI build artifacts. But obviously they can be in the container as layers if people do it wrong, which is bad.

If the build artifact is an image, and the keys are in the image, then the keys are in the build artifact.

> Yes fully defining your inputs to the container by passing in the keys is a good solution.

Are you suggesting doing a build by an incantation like:

    $ docker run --rm -v /input:[sources] -v "/keys:($HOME)/.ssh" my_builder:latest /input/build_my_thing
This is IMO a terrible idea. A good build system DOES NOT PROVIDE KEYS TO THE BUILD PROCESS.

Yes, I realize that almost everyone fudges this because we have lots of tools that make it easy. Even really modern stuff like uv does this.

    $ uv build
whoops, that uses optional credentials, fetches (hopefully locked-by-hash) dependencies, and builds. It's convenient for development. But for a production build, this would be much better if it was cleanly split into a fetch-the-dependencies step and a build step and the build step ran without network access or any sort of credentials.

[0] https://specs.opencontainers.org/image-spec/

[1] A build artifact could be a container snapshot, but that's different.

Container is standard terminology to refer to a running instance of an image. Yes I was being imprecise, substitute container for oci image. But you seem hung up on frivolity and not getting what I'm saying. We are agreeing with each other and just talking in circles. I can see that you don't see that but that's ok. All of this was because I misunderstood what you said initially when you referred to build artifact as the oci image when I thought you were talking about other sorts of build artifacts.

I mean using the CI system to pass in keys or creds. Yes, it's better to build the image with dependencies, but sometimes you can't do that.

I hate the nanny state behavior of docker build and not being allowed to modify files/data outside of the build container and cache, like having a NFS mount for sharing data in the build or copying files out of the build.

Let me have side effects, I'm a consenting adult and understand the consequences!!!