Hacker News new | ask | show | jobs
by amluto 110 days ago
There is something wrong with the industry in which we think that, when a production build requires SSH keys, the problem is that the keys might leak into the build artifact.
1 comments

Keys leaking into the build artifact was never the concern.

It's about not having the private keys stored unknowingly in intermediate layers of a build container.

Those intermediate layers are usually part of the artifact. Try exporting an image with docker save and investigate what’s inside. This is all documented in a mostly comprehensible manner in the OCI specs.

I’m afraid you’re missing my point, though. A high quality build system takes fixed inputs and produces outputs that are, to the extent possible, only a function of the inputs. If there’s a separate process that downloads the inputs (and preferably makes sure they are bitwise identical to what is expected), fine, but that step should be strictly outside the inputs to the actual thing that produces the release artifact. Think of it as:

    artifact = build_process(inputs)

    inputs = fetch(credentials, cache, hashes, etc)
Or, even better perhaps:

    inputs = …
    assert hash(inputs) == expected
(And now, unless you accidentally hash your credentials into the expected hash, you can’t leak credentials into the output!)

Once you have commingled it so that it looks like:

    final output, intermediate layers = monolithic_mess(credentials, cache, etc)
Then you completely lose track of which parts are deterministic, what lives in the intermediate layers, where the credentials go, etc.

Docker build is not a good build system, and it strongly encourages users to do this the wrong way, and there are many, many things wrong with it, and only one of those things is that the intermediate layers that you might think of as a cache are also exposed as part of the output.

It was confusing of you to say build artifact to refer to the container itself in this context. Sure you're not wrong because the container is also a build artifact, but in context of CI, build artifacts is the output of running the build using the container.

Hence my confusion of what you meant -- no one's saying ssh keys are in the CI build artifacts. But obviously they can be in the container as layers if people do it wrong, which is bad.

We're talking about the same thing basically. Yes fully defining your inputs to the container by passing in the keys is a good solution.

I think there's a lot of confusing terminology in your comment.

> the container is also a build artifact

By "build artifact" I mean the data that is the output of the build and get distributed to other machines (or run locally perhaps). So a build artifact can be a tarball, an OCI image [0], etc. But calling a container a build artifact is really quite strange. A "container" is generally taken to mean the thing you might see in the output of 'docker container ls' or similar -- they're a whole pile of state including a filesystem, a bunch of volume mounts, and some running processes if they're not stopped. You don't distribute containers to other machines [1].

> in context of CI, the output of running the build using the container

I have no idea what you mean. What container? CI doesn't necessarily involve containers at all.

> no one's saying ssh keys are in the CI build artifacts. But obviously they can be in the container as layers if people do it wrong, which is bad.

If the build artifact is an image, and the keys are in the image, then the keys are in the build artifact.

> Yes fully defining your inputs to the container by passing in the keys is a good solution.

Are you suggesting doing a build by an incantation like:

    $ docker run --rm -v /input:[sources] -v "/keys:($HOME)/.ssh" my_builder:latest /input/build_my_thing
This is IMO a terrible idea. A good build system DOES NOT PROVIDE KEYS TO THE BUILD PROCESS.

Yes, I realize that almost everyone fudges this because we have lots of tools that make it easy. Even really modern stuff like uv does this.

    $ uv build
whoops, that uses optional credentials, fetches (hopefully locked-by-hash) dependencies, and builds. It's convenient for development. But for a production build, this would be much better if it was cleanly split into a fetch-the-dependencies step and a build step and the build step ran without network access or any sort of credentials.

[0] https://specs.opencontainers.org/image-spec/

[1] A build artifact could be a container snapshot, but that's different.

Container is standard terminology to refer to a running instance of an image. Yes I was being imprecise, substitute container for oci image. But you seem hung up on frivolity and not getting what I'm saying. We are agreeing with each other and just talking in circles. I can see that you don't see that but that's ok. All of this was because I misunderstood what you said initially when you referred to build artifact as the oci image when I thought you were talking about other sorts of build artifacts.

I mean using the CI system to pass in keys or creds. Yes, it's better to build the image with dependencies, but sometimes you can't do that.