Hacker News new | ask | show | jobs
by _0w8t 3714 days ago
A common problem with Docker is after running a compiler/preprocessor during an image build one ends up with the bulid tools inside the image. A workaround is to use a shell script that first gets the image with the compiler, run it and then pass the output to the Docker build. But this is non-standard and encourages running random scripts on the build host defeating the advantages of using Docker during development to isolate build system. It is nice that Nix addresses this.
3 comments

Or you could remove the tools at the end of your Dockerfile...

Edit: am I missing something? This is a legitimate solution to the problem. Install the tools, compile, and remove them. The parent is suggesting a very clumsy approach (build on the host and pass the binary to the container as it's being built).

(I didn't downvote you)

Disclaimer: I work at Docker.

Your approach is the logical one... But Docker currently has a limitation in how it handles removing files in a build. After each build step, the intermediary state is committed as a layer, just like a git commit. So removing files in a docker build is like removing files in git: they are still taking up space in the history.

The long-term solution is to support image "squashing" or "flattening" in docker-build.

A less clumsy short-term solution is to build a Docker image of the build environment; then 'docker run' that image to produce the final artifact. At least that way you get rid of the dependency on the host, which keeps your build more portable (if not as convenient as a single 'docker build')

Our approach is to view Docker as part of our overall development process and then develop stage specific containers.

For example, we have development containers, build containers and runtime containers. Runtime containers are further segmented into product demo containers, testing containers and production containers.

I just published a new article on Docker this morning: http://www.dynomitedb.com/blog/2016/04/13/docker-containers/

An important point is that the build containers produce binaries that are used in both native package managers (ex. apt) and in Docker containers.

If you're interested in seeing this in action then checkout our source on GH: https://github.com/DynomiteDB

IMHO, a well designed approach to UnionFS layers is vital to high quality container architecture.

While we're focused on container use for databases (both in-memory and on-disk), much of our approach applies equally well to application layer containers.

Nice reference about https://www.projectcalico.org . At some point insanity of using ethernet on top of UDP to carry IP traffic between containers must stop.
Straight from the horse's mouth--admire your product, Mr. Hykes!

I love how you can run docker inside of a container. What I've done sometimes is run docker inside my build environment container. I use Docker Machine (OSX), so I just send the same machine environmental variables over to the container, but on Linux you could just link the socket file. In fact, I have a container just for Google Cloud that maintains my GKE config and makes it easy for me to prepare new deployments to the cloud.

Could you elaborate on this process of deploying your container from inside docker via socket linking? I'm not sure I follow.
> But Docker currently has a limitation in how it handles removing files in a build. After each build step, the intermediary state is committed as a layer, just like a git commit. So removing files in a docker build is like removing files in git: they are still taking up space in the history.

That's true unless you do the "everything in a single RUN statement" trick that is very popular.

"Very popular" for the one case of installing things via package manager.
Proper solution is to build packages first using build environment, then install built binary packages in container, like any other package.
There's also another workaround to the ones mentioned by other posters: You could install your compilers, do your build, and clean up all your build tools within just one shell script invoked by a RUN line in Dockerfile. It's not pretty but it works.
Yes that is a common workaround as well.

By the way: there is an open request for contributions to help improve this. The core Docker team very much wants to improve squashing in build, but it's a matter of time and resources.

If somebody cares enough to take the time to carry a design proposal then a patch, we would be happy to support that effort!

The idea is not to build anything on the host. Rather it is more like a staged build initiated through a shell script. First pull/build image with the compiler, then docker run it to compile the application and finally build the final image with the application.
The layered fs that Docker uses is based on additive snapshots, so removing tools at the end will paradoxically increase your image size with a useless snapshot.
You can do everything within a single RUN statement to avoid that.
Yes, but at the expense of readability, development speed, and incremental updates to images (where typically the dependencies layer changes slower than your target code).
The idea is actually to do something similar to a heroku buildpack where you have a container with build tools that generates binary assets. You then inject the built binary into a new image that has only runtime dependencies installed.

I've experimented (and use) a variant of this workflow myself built around my marina tool [1]. The basic idea is to define a file that uses a dev/builder image to build, then exports a tarball into a runner image.

[1] https://pypi.python.org/pypi/marina

It is much easier to just use standard package format (rpm/deb/apk/etc.) and standard installer (yum/dnf/apt/apk/etc.). Of course, you can invent your own build and installation system, it will work too.
That depends on a lot of factors. The advantage of an approach like this is that every package is built in a clean-room container independent of the host. For example my host is os x and I'm building binary tarballs to run on ubuntu. If you have a build server obviously this is less of an issue.
Just build .deb's instead of tarballs. Use "alien" to convert .tgz into .deb, for example. I see no reason to invent my own build system, package format and package management software. I build my rpm packages in clean room chroot (using mock) for about decade. It works fine in docker too.
Nix has been a very cool project to watch over the years.

You can address part of the problem of picking up extra data in final images by declaring temporary build locations, such as `/var/lib/cache`, as a volume. Anything written to a volume won't be included in the final image.