|
It has been a reasonably big issue. E.g. I kept seeing zombies with Consul for a while until we realised that every single Consul Docker container on Dockerhub just had Consul run as pid 1 in the container (this is a while ago, no idea if that's still the case), without realising that Consul health checks then could end up as zombies if you weren't very careful about how you wrote them (e.g. typical example: Spawning curl from a shell script, with a timeout on the health check that was shorter than any timeouts on the curl requests). It's usually fairly simple to fix (e.g. for Consul above, I raised it with the Consul guys and they said they'd look at adding waiting on children to it as a precaution - it's just a couple of lines -, but people building containers could also introduce a minimal init, or you can write your health checks to guard against it), but it happens all over the place, and people are often unaware and so not on the lookout for it and it may not be immediately obvious. The reason I raised it as an issue for Consul, for example, even though it wasn't really their fault, but an issue with the containers, is that people need to be aware of the problem when packaging the containers, need to be aware that a given application may spawn children, and that they may not wait for them. Even a lot of people aware of the zombie issue end up packaging software that they didn't realise where spawning child processes that could end up as zombies (in this case, it took running it in a container without a proper pid 1, using health checks which not everyone will do, and writing the health checks in a particular way in order to notice the effects). Thankfully there are a number of tiny little inits. E.g. there's suckless sinit [1], Tini[2] , and here's a tiny little proof of concept Go init [3] I wrote (though frankly, suckless or Tini compiled with musl will give you a much smaller binary) as what little you actually need to do is very trivial. [1] http://git.suckless.org/sinit [2] https://github.com/krallin/tini [3] https://gist.github.com/vidarh/91a110792c86d6c3bb41 |
Also thanks for the Consul example, makes it much-much easier to see the issue and argue for a general solution. (So not every random app/project/service/daemon has to implement pid1 functionality.)