Hacker News new | ask | show | jobs
A 1 KB Docker Container (blog.quickmediasolutions.com)
216 points by nathan-osman 3192 days ago
16 comments

As part of a competition before the last DockerCon I managed to get a container down to 69B

Details: http://thebsdbox.co.uk/in-pursuit-of-a-tinier-binary-er/

Code: https://gist.github.com/thebsdbox/29e395299f89b52214b66269f5...

Is "69 bytes" for the binary, or for the whole container image? I would expect at least some metadata.
Nice! I did something similar a few years back as a bit of a joke at work, and best I could come up with was 345 bytes ...

    mov    rax,34 ; pause()
    syscall
Nice.
The smallest useful container I know of is 129B. It was created to test how many containers docker can spin up while reducing the overhead of what was in the container itself.

tianon/sleeping-beauty latest 2e8193709fa7 6 months ago 129B

https://github.com/tianon/dockerfiles/tree/master/sleeping-b...

Interesting. Incidentally,

> It was created to test how many containers docker can spin up

What was the answer? I'd think on the order of 200-300 on a server with 64 GB of RAM. (Pure guess!)

Depends on several other variables.

If you use default docker options, you'll be creating a veth pair per container. You might run into a limit there at around 1024 containers. You also might hit ulimit if your system isn't well configured.

If you use --net=none, you won't hit that issue, and you'll probably be able to manage quite a few

The resource usage ends up being roughly 4 bytes rss for the executable in the container and around 3.5MB for the "containerd-shim" go binary that parents the container.

"containerd" and "dockerd" both probably have a little extra resource usage per container they're managing, but I'd guess that's on the order of about 200KB per at most.

The next big limit you'll hit is the process/pid limit (/proc/sys/kernel/pid_max) which defaults to 32k.

Fortunately, due to the memory overhead of a bit under 4MB, you probably won't get there on your 64GB of ram server and might cap out at around 15k containers total.

Experimentally, my linux laptop (running docker 17.06) is able to run 1100 copies of that sleeping-beuaty container using almost exactly 2GB RSS additional memory and no noticeable additional cpu

This is even better than I calculated above, possibly due to shared memory for containerd-shim. I'm not investigating further.

What happens when you hit the limit? Thrashing swap or oom kills? (And what kind of overcommit policy did you use?)
I didn't hit any limit, but since I don't have swap it would likely be oom kills.

The first part of my post is speculating, the second part I ran an arbitrary number and observed resource usage to allow extrapolation, but didn't hit a limit.

Thanks for reproducing the experiment. :) I sent a message to tianon asking him if he remembers what the original numbers were. He told me long ago, I think it was around 1000. This was well before the go shim or dockerd existed.
thanks, this was interesting! That's better than I expected.
That is very low. The current official limitation in OpenShift (a Kubernetes distro) is 250 but there have been lab clusters which have gone higher.
There are articles about running 2500 web server containers on a raspberry pi :)
I would expect a 64 gig machine to be able to host 200-300 VMs, and an order or two magnitudes more containers
I guess you don't have anything CPU intensive running.

General good practice for sysadmins is to affine CPU cores/threads so that you don't end up flushing the CPU cache too aggressively and you don't split your VM's over NUMA zones because memory locality is important.

Well, I was asking what the actual result was. GP (who didn't respond to me) mentioned it was an experiment. TheDong gave experimental results in a reply though.
The server hosting the article above is also running 38 other containers - databases, WordPress blogs, Jenkins (CI), Gogs (Git), etc. And it only has 4 GB of RAM.
I'll have to ask tianon to see what he says. I don't remember the number.
this thread is kind of stale - but if you'd do so and reply (you can also email me at the email in my profile) I'll appreciate it.
Another example of a really small container that doesn't do much, but is made without using assembly directly is the Docker "hello-world". It's built from C without linking in libc:

https://github.com/docker-library/hello-world/blob/master/he...

I always thought this was a bit misleading. A "hello world" container is 1 kB, but the bare minimum container that does something useful in practice is rarely less than 100 MB in size.

If you base off alpine, you can get useful containers quite a lot smaller than 100MB.

One example i use is an agent i deploy to kubernetes clusters to do some security scanning. The scripts are ruby and the image clocks in at 9MB compressed https://hub.docker.com/r/raesene/kaa-agent/tags/

On the same note I did a mariadb container that is ~12mb: https://hub.docker.com/r/jbergstroem/mariadb-alpine/

If you're into go, it's not too hard to get very small (<5mb) shippables by statically compiling against musl and using upx. Here's a somewhat scrubbed Dockerfile for a gRPC/rest service I use at work: https://gist.github.com/jbergstroem/680cb7db6f90319dcd7666f3...

5mb still sounds like a lot, considering you could squeeze Linux 1.3 on a 1.44 mb floppy with a (compressed) rootfs... I mean does the runtime really do that much more than a full (although old) os kernel and a C library/runtime and apps?
The entirety of Debian 0.97 (kernel, userspace, packages) fit on two floppies back in 1994 :P
For that I reckon you'd have to file a bug with golang.
Yes, I use Alpine for a lot of my other containers. I love the simplicity of the package manager as well.
Alpine's package manager has the great property that you don't need to update the index in order to fetch a package IIRC; the whole `apt-get update && apt-get install && <cleanup apt-cache>` dance is quite tedious in debian-based Docker containers.
No, you still need to, but there's a compact syntax for it that will update and discard the index in a single 'add' command. It's unavoidable - somewhere some querying is happening in order map the package name/ver to a download link.
I find the haproxy (alpine) Dockerfile a great example on how to tender to container file-size. It uses the syntax you're referring to, temporary build virtuals (should be multistage today I guess) and static linking: https://github.com/docker-library/haproxy/blob/2d393f2b59824...
Well you can throw static builds of useful go programs which can very well be smaller than 100MB on docker scratch images. :)
> but the bare minimum container that does something useful in practice is rarely less than 100 MB in size.

I’ve made containers using code written in most common programming languages (python/go/ruby/c/rust/heck even PHP/etc) which were easily under 100mb, most significantly less. If your containers are frequently >100mb, I say you’re either using the JVM or are doing it wrong!

Actually, even the entire JVM is below 60MB without Jigsaw, with Jigsaw it can go as low as 10MB
Yeah, I can easily make JVM containers under 100mb too usually, but it’s not always that small, so I wanted to give a little benefit of the doubt there.
Interesting. I must confess, I didn't realize there was a way to make syscalls directly from C without resorting to inline assembly.
We already have a portable container format for executables with no dependencies... It's called an ELF binary... Why would you even put a static binary in a container in the first place? I don't get it.
Containers are not just about shipping binaries. In fact, they don't really add too much in that category. Namespace isolation and resource limiting with cgroups are the real benefits.

And in most real world cases, you will still need at least libc and ca-certificates.

I guess if you wanted to take advantage of the implicit cgroup that the process then becomes a part of? Of course you could set up such a cgroup without Docker, but maybe it's just more convenient in certain cases to use Docker for it to keep your deployment process more consistent.
You still need to administer the box it runs on. Offloading that to others is often useful.
I guess you mean something like Compose or Kubernetes. I think it's weird these things force you to use the docker runtime, personally. I would really love a more flexible definition in kubernetes of what a "resource to be ran" means. I know they support multiple container runtimes now, but what if I dont want a container runtime at all?

Nomad supports raw executables to be downloaded and scheduled, which is nice(https://nomadproject.io) but then again, kubernetes seems miles ahead in what it supports (autoscaling, volume claims, RBAC etc)

Otherwise, more traditional means of managing your services can be employed. I've got a lot of leverage out of systemd myself. Which by the way, supports all the features of a proper container runtime. You can namespace your executable, Chroot it, limit what devices it can access, etc, which is kinda awesome. Check out `man systemd.exec` and `man systemd.resource-control`

Kubernetes absolutely does not force you to use the docker runtime. In fact, there has been a lot of work to avoid this by creating the CRI[0].

Kubernetes also supports extensions like the Third Party Resource or their successor, Custom Resource Definitions. KubeVirt[1] is an example of extending resources to include VMs

[0] http://blog.kubernetes.io/2016/12/container-runtime-interfac...

[1] https://github.com/kubevirt/kubevirt

And CRI-O [0] is another open project making head way in that space.

[0] https://github.com/kubernetes-incubator/cri-o

CRI looks very interesting! Thanks for the pointer.
Even static binaries often have some dependencies, eg SSL root certificates. Plus adding metadata to elf is not that convenient.
This is of course pretty pointless, I mean it's neat but all these tiny containers don't really do anything so it isn't much more than a cool trick.

However there is a great lesson to take from this: you can create single-binary but really useful containers for just a few MBs, which is nothing in practice for a usual sized server, an a lot more lightweight than usual containers based off Alpine or Ubuntu.

In fact I run my static websites using that: a small 8-ish MB statically compiled web server "written" (it's really just library glue) in Go.

Do you put the static content into the container?
I've done this with a couple of applications, using something like fileb0x in the build process to convert the files into Go source which can then be compiled into the final executable.
I have one that is 344 bytes: https://github.com/dseevr/cpu_consumer

It just runs a tight loop that consumes an entire core.

This is cool! thanks.
Container in a (240 character) tweet https://twitter.com/thaJeztah/status/913378165124423680

Replace -d with -D for macOS

If you want to reduce the size more, try emitting a single layer as a tarball which you can pipe to docker load. That will reduce your layer count to one, plus you can omit a bunch of the metadata that Docker includes when you build from a Dockerfile (it doesn't seem to care at runtime).

https://github.com/moby/moby/blob/master/image/spec/v1.md

Indeed.

Another useful technique, which has been recently introduced, is Multi-Stage Builds https://docs.docker.com/engine/userguide/eng-image/multistag... . This lets you avoid putting tools needed for compilation into the final image.

Notice that the executable itself contains less than 100 bytes of instructions, but the file is still 736 bytes. Even without optimising the Asm itself (I can see at least 2-3 bytes improvement at a glance), that could probably be reduced even further:

http://www.muppetlabs.com/~breadbox/software/tiny/teensy.htm...

736B may seem tiny to most people, but if you're working in Asm that's a lot --- there's plenty of interesting things (beyond "call the OS a few times") the demoscene has done with smaller binaries; here's an assortment of 512B ones:

http://www.pouet.net/prodlist.php?type[]=512b

I developed a little side project that aimed to be a serveless open source alternative, it is called effe and uses go as main language.

The whole idea is to compile a go program and put it into a docker container and have it listening to the network for a single HTTP endpoint.

It is interesting because an useful images come to be less than 6MB :)

Link to the project: https://github.com/siscia/effe-tool

Reminded me of people making docker containers out of this: https://github.com/nemasu/asmttpd

eg. https://hub.docker.com/r/0xff/asmttpd/

7KB web server container.

Are there any non-academic reasons to do use such servers?
Embedded stuff like routers? Serving a user manual for some network connected thing with a microcontroller?
I did a similar thing, using a tiny executable that someone else had made. However, even though the container was around 100 bytes, I couldn't make a container that was smaller than some much larger number, maybe 512kb? This was in 2015, so maybe that limit has changed - any docker folks know anything about this?
For reference, Kubernetes uses a `pause`[0] for a similar reason.

[0]: https://github.com/kubernetes/kubernetes/tree/master/build/p...

OT, what's the smallest windows/.net containers out there? I have some legacy stuff I'd like to dockerize but they end up so big and slow to deploy. I've just been rewriting them.
Im not familiar with dockers so bear with me, but why is another container needed for the reverse proxy?
The reverse proxy is configured to route requests to any container that is currently running. For example, if I have a Jenkins container, as long as it is running, the reverse proxy will send requests to it. In this particular case, I need to proxy a remote host. In order to do that, I have a container running with the appropriate labels, the proxy sees the container running, and it proxies requests to the remote host.
I think I'm being particularly dense here, but do you mean the proxy is proxying to your tiny 1kb container? If so, what happens to the traffic? Or because the tiny container is running the proxy proxies somewhere else? i.e you're using the presence of a container with a label as configuration for the proxy?
I know, it's a bit difficult to explain and a tad confusing. No, the proxy is not sending any traffic to the container. Rather, the container, by simple virtue of being in the running state, tells the proxy to send requests to the domain specified by the label on the container.
What happens if the container isn't running?
Then the proxy will not route requests to the host specified by the container's label.
This was a really fascinating read. Enjoyed it thoroughly