Hacker News new | ask | show | jobs
by jacques_chester 2633 days ago
> But those problems are independent of build frontends: you could solve them once for both buildpacks and Dockerfiles.

You can build an image with both technologies, but that's not the key to the argument. The key here is every Dockerfile is unique and potentially quite different from any other Dockerfile. Small differences in layer order and layer contents multiply to very large inefficiencies at scale.

The way you tackle this problem is to make the ordering and contents of layers predictable for any given software that is being built. You can achieve this with Dockerfiles with golden images, strict control of access to Dockerhub, complicated `FROM` hierarchies, the whole shebang.

But at that point you are reinventing buildpacks, at your own expense.

Note that this doesn't change with or without buildkit.

> a new build format, and a new build implementation. Docker is going in the opposite direction by unbundling the format (Dockerfile) from the implementation (buildkit).

Is your understanding that we invented a new image format or that we rewrote most of Docker? Or that the way we've written it prevents, for all times and all purposes, adopting buildkit as part of the system in future?

Because both of those are misapprehensions. We have extensively reused code and APIs from Docker, especially the registry API.

1 comments

> Small differences in layer order and layer contents multiply to very large inefficiencies at scale.

Can you provide an example of how layer order can cause an issue?

Consider:

    FROM nodejs
    COPY /mycode /app
    RUN npm install /app
Now suppose I change my app code. In a Dockerfile situation, the change to the `COPY` invalidates the `RUN npm install /app` layer, even if I didn't change anything that NPM would care about.

An NPM buildpack can signal that there's nothing to change, allowing the overall lifecycle to skip re-building that layer.

There's also the problem of efficient composition. Suppose I have this:

    RUN wget https://example.com/popular-shell-script.sh && \
        go get https://git.example.com/something-else@abc123 && \
        ./popular-shell-script.sh && \
        rm ./popular-shell-script
And this:

        RUN go get https://git.example.com/something-else@abc123
Both of the resulting images will contain the same `something-else` binary and in an ideal world of file-level manifests I could save on rebuilds and bandwidth consumption (NixOS has this, approximately).

But I don't get to do that, because the layers have different overall contents and different digests. Buildpacks don't get you all the way to a file-centric approach, but because they follow a repeatable, controlled pattern of selecting the contents and order of layers, they greatly improve layer reuse between many images.