As a kid I was a compulsive liar and would make up non-existent stories.
With the advent of search engines I discovered that any story I can come up with actually happened somewhere in the world and has an article about it already.
Example: Fake barn county. You're driving in a county and see lots of barns. You point to one and say "that's a barn". Unbeknownst to you, almost all the barns are not barns but are facades. By accident you actually are pointing at one of the few real barns.
Pretty cool, and very educational for people not familiar with how these things work
That said (and I've said this before), this is not really Docker. It's running containers, not the same thing. If you want to compare it to anything, it's runc, but that's not a good headline :)
Not that the docker architecture is that clean, but it is the combination of ideas it brought to the table what made docker docker:
- have APIs to do everything, from launching workloads to building images.
- combining layered filesystems with os-level namespaces
- package format for "images" coupled with a distribution system
If it was just starting containers, that was already possible for a good while (and many shared hosting providers already did this).
I think many people underestimate the importance of the first point, having an API to do all this. It's having this combination of ideas that democratised cloud computing, it is what makes the bigger picture possible. While the Docker API is currently not very important anymore, it showed the possibilities, and made the limitations it had also very apparent, and at the beginning, nobody had solutions for this. It took things like Mesos and kubernetes to take it to the next level, with the latter having become the de-facto standard container API.
Honestly, I don't like lines of code as a metric for anything over the novelty amount of 1. And even then, that's usually some demonic Python list comprehension code.
That being said, the code here is pretty approachable and they weren't noticeably trying to cram it into fewer lines. Like the library, not this marketing.
LoC makes a good bandpass filter. Too few and the code is usually hard to understand, too many and the code is unnecessarily bloated. Once you've been coding in a language for a while you get a sense of what's 'just right' for a complex feature (assuming you understand the feature properly).
What really bugs me about LoC as a metric though is when people don't count libraries. "I made a 3D engine in 10 lines" is just a flat out lie if one of those lines is '#include "Unreal/Engine'.
While I tend to agree, I think LoC can give a sense of scale for large systems and at least a tiny bit of insight about its potential complexity.
I recently worked with a client on an integration effort that had to touch many different points on a (massive, for me) codebase with tens of millions of LoC. For that, LoC was the only reasonable metric I could come up with to try and convey the scale and complexity of the task at hand--being quite ignorant of the system's (and subsystems') architecture(s) at the time.
That was further complicated by the way the massive codebase supported all sorts of dynamic compositions and certain interactions needed to work with baseline compositions expressed in a form of markup totaling about about 5 times the amount of the actual code base, amongst other things.
These folks thought the integration could be done for $80k tops and in a couple months. It took LoC metrics to get them to understand the potential complexity at hand and that a lot more time needed to be spent in assessment and design before jumping in.
...and it was probably around 1.5K lines early in its life.
But that misses the point. This is very helpful for understanding how a Docker-like system works since it's a small and mostly self-contained implementation. You can read through the entire thing and fully understand it.
It only cheats in the container registry handling where it pulls in github.com/google/go-containerregistry and for the network setup where it uses github.com/vishvananda/netlink. The rest is done in terms of Go stdlib and syscalls.
Early Docker used external utilities (lxc, iptables) and had the client/server stuff already so it's not as straightforward.
Yep. Docker is famously written Go. The only reason I wrote this is as an educational tool to discuss how containers under Linux really work. It’s self-contained (pun unintended) except for the usage of the go-containerregistry package, which anyway does stuff unrelated to how containers are managed on Linux.
> What Programming Language Does Docker Use?
Docker is written in the Google Go (golang) programming language. To learn why Go was used, we’ll refer you directly to Google.[1]
* A Docker image normally contains all the dependencies of a program, or a set of programs. You can run libraries and other software of whatever versions, not necessarily available on your host system; they are already baked into the image. Usually a Docker image only needs a compatible kernel (this is a very lax restriction). It is a damn easy way to distribute software, especially such software which is not trivial to install and set up: tired of wrangling with Grafana installation? just pick a container from their site. And of course you can mount whatever you need inside the container when you need to, so it has controlled access to your filesystem(s).
* A Docker image normally runs with its own firewall. That is, you explicitly say which IPs / ports are available ("exposed") from the container; everything else is blocked. This helps isolate containers from the internet and from one another, and also helps build private networks between containers not exposed outside the host machine. Since containers already talk to each other via a network, it becomes easy to distribute them across many machines when you need to scale.
* A Docker image is built out of layers, and they can share layers. If you are reasonable enough to put common stuff to the bottom layers, then you can have multiple containers with a lot of common software installed inside (like a Node runtime, a JVM, etc) which stored on the host system only once.
* Docker images / containers are the standard for many cloud management systems. AWS can run containers directly. K8s operates on containers. Docker itself offers a simple but rather reasonable orchestration tool called docker-compose. It's great for small deployments and for things like running your setup locally, for development and integration testing.
Containers are not always better for everything you can think of. But they solve a number of common problems; some of these problems might be ones you'd like to have solved, some not.
The biggest advantages though is having to explicitly define al the edges:
- You need persistent storage? You better define it or you'll lose it one the next (re)deploy.
- You need to expose network services? Tell me which-ones or it won't work.
If we're talking on small scale single server deploys, it makes backup, upgrade/rollback and migration of applications a LOT easier.
As long as you're not talking about a k8s cluster - which you should avoid with application architectures that aren't "cloud native", I assume you'll have a local docker-compose file which you just start/stop to bring the entire application stack you need (database/app/proxy server/monitoring/...) with one command, which means all external service dependencies are also contained in one 'stack'.
What I also use it for on small-scale apps is having a test environment of the same software running on the same droplet. I just put a Traefik reverse proxy in front of it that autodetects the docker containers, handles HTTPS/ACME certificates and routes the test-url to the test-containers, the real URL to the "production" containers, and they're all isolated.
The joke that I think actually explains it pretty well is that it eliminates the "well it works on _my_ machine" problem, by not just shipping the code, but shipping the machine.
I know that this joke isn't exactly what's happening. I just feel like it's a good way to explain the _idea_ of Docker to people who have no idea what it is.
Reproducibility is the biggest value in my opinion. A Dockerfile encapsulates all the messy dependencies in a single isolated environment. This also makes deployments easier too.
I can answer this one. Sometimes you have lines like this:
FROM ubuntu:focal
RUN apt-get -y install libssl-dev
<your app details>
Since libssl-dev gets periodically updated (security updates and whatnot) if you build this now and build it again in 1 year you're very probably not going to get the same OpenSSL version. So it MIGHT be reproducible, but can easily give you different results depending on updates to the packages and the way your Dockerfile imports external dependencies. And that's before we even mention updates to the base container image.
Of course, you can refer to a specific container image id and pin all your packages, which would go a long way to improving reproducibility.
Very nice! I love projects like this that return to first principles, and rebuild the core without the cruft. It is refreshing.
The dependency on netlink adds a little to the code weight. Some of this also feels like it could just be a shell script (sh, unlike Go, ships with built in Linux support for netlink, and sticks with dotted-quad-string types for IP addresses instead of mixing with int32!)
I did not realize cgroups were this simple to manipulate. Thank you for the enlightenment.
I did a quick search for whether "fun" Hex strings are reachable, but didn't find any (e.g. the canonical http://0xcafebabe). Random combination of hex-words are login pages to web cameras or cable modems. I didn't try to e.g. replace "e" with "3", e.g. http://0xcaf3babe/
runc - which is the low-component that does the actual container launching in Docker and other runtimes - is mostly written in Go and quite approachable[1], if you're curious what a production-ready container runtime looks like.
Namespaces look simple on the surface, but there are plenty of subleties, particularly when using Go:
- `runtime.LockOSThread()` has to be called before entering a namespace to pin the goroutine to a specific OS thread. The unshare call affects only the current thread[2][3]. Even then, you have to be careful not to spawn any new goroutines[4]. For this reason, parts of runc are currently written in C (you could technically implement it in pure Go, but the maintainers believe it's easier to reason about the C implementation).
- The container runtime has to reexec itself from a copy of itself in a memfd to prevent the container from writing to /proc/self/exe[5][6].
- Various race conditions and symlink attacks during container setup[7][8].
- Some parts of the container initialization have to be done after switching to the new rootfs, which is attacker-controlled territory[9][10].
- ... and plenty of other gotchas, the runc code is full of comments that explain why things have to be done in particular ways.
Obviously, Gocker is an experiment and does none of these things, and you shouldn't run it on anything that you care about :) Sometimes things are complex for a reason.
> Or Gocker, an implementation of Docker in go...
https://news.ycombinator.com/item?id=16119842