Hacker News new | ask | show | jobs
by zomglings 2041 days ago
Why do you say that docker-in-docker buys him nothing? It's not obvious at all and you go into no detail whatsoever to back up your opinion.

In my experience, that is not true at all. Docker-in-docker allows me to deliver smaller images that can fit into a CI flow as language plugins instead of shipping a beastly 5G docker image with every possible language runtime I need to support for my CI tool.

2 comments

It is because to build the image using docker requires the docker client to talk with a dockerd daemon, so one has to configure the client to access the dockerd which allow untrusted code to run as root in the host.

Docker-in-docker is a workaround to make docker work in CI.

Basically a security nightmare and bad design that podman doesn't have.

Any build script can do serious damage to the environment it runs in. Before docker, you'd have to create a new VM from time to time because the build agent had rotted away or died in an altercation with a bad build.

Docker in Docker in CI is like a lock on a door. It keeps honest people from being naughty, and is fairly efficient about it.

I don't think the question is "should I run CI in docker in docker," it's whose CI should I run in docker in docker. Me in my coworkers can share docker images. Customers or freeloaders cannot. So if that's in your problem domain, then you're right, it's a bad idea. But it isn't for most people.

You do know that spinning up a new VM only takes a few seconds? With projects like https://firecracker-microvm.github.io/, the difference between launching a new Docker container or a new VM is negligible.

This works great if you own or rent the hardware, but most cloud providers don't allow nested virtualization.

The cost is not spinning up the vm, it’s maintaining the images. Docker composability reduces the combinatorics problem to a dull roar, and democratizes some of the maintenance effort. You want an image with the bug fix from the latest point release of python? And you need it by noon? Knock yourself out.

Although there are tools to convert docker images to vm images. I expect if I were running community CI infrastructure, getting really familiar with those would be high on my priority list.

The other option that works really well in a single user environment is to bind to the runner's Docker daemon. That way builds run as siblings of the runner's daemon rather than as children via docker-in-docker.

The huge issue with that is security which is why it's only really practical for a single user or a small group of trusted users. A secondary issue is that (I think) builds can't run simultaneously because they can trample each other when tagging images (since all images are on the runner's daemon).

If I had to build a Docker focused CI system I'd think about using Weave Ignite (AWS Firecracker) to spin up VMs for runners with the Docker socket bound like described above. That way you get all the convenience of binding the Docker socket, but the isolation of a VM that gets thrown away after the build step (or pipeline) finishes. That idea also fits well with local running / debugging IMO because you can bind to the Docker socket on your development workstation (assuming you're not running a large build of parallel tasks which might be an unrealistic assumption).

For us it’s a matter of the CI tool fetching the source code for the docker image, then running docker build, and not necessarily immediately. So you have ‘docker build’ happening toward the end of a set of other tasks. Which I’d really like to have running on a fresh VM or container.

You could separate those into two builds, but the reason they are together is so people think about deployment, and in case any structural changes to the code need to coincide with deployment changes. For instance, breaking changes in APIs. I need a new version of tool/library and I need to change how I call it.

Kubernetes as a layer of indirection is another solution.
Kubernetes is a consistent management api for linux (so then you don't need to interact with iptables, mount and all that "hard stuff").

I don't think kubernetes is a solution for context of building an image (a rootfs tree into a .tar.gz file).

Unless you are using kaniko which extends the kubernetes api to add the capability of creating images, but that is handled by kaniko itself via the same api.

I was probably unclear. Kubernetes is a good solution for managing containers (obviously). I use it for CI and it works very, very well, though the CI tools still have more features they could add with the integration.
> beastly 5G docker image

my beastly 12GB image that even includes Matlab wants a word with you

>> beastly 5G docker image > my beastly 12GB image that even includes Matlab wants a word with you

Perhaps in the next 10 years we will be rediscovering packages. :P

If you are in the business of charging complex prices per bits over the network, then docker seems to be quite a good investment and making it as popular as possible is a good strategy to print money. /s

> If you are in the business of charging complex prices per bits over the network, then docker seems to be quite a good investment and making it as popular as possible is a good strategy to print money. /s

True, that.

To be fair, at least it allows me to avoid lots of the brokenness of Python packaging.

Is always good to report packaging bugs so then people can fix them, do you have examples of python packages that can be improved?
See my previous comments.

tl;dr pip silently breaks my environments, mostly connected to upgrading numpy and other scientific/data science libraries.

Well, to be fair, it is packages - I'm just using Docker (for this section of our stack) as a different sort of VM, essentially. It runs a service manager and a VNC X session, for chrissakes ;)
Our images are 35GB, and I've spent much of the last two weeks breaking up files so we don't hit the 8GB per file limit, and my next week will be trying to avoid hitting the per-layer limit.