Hacker News new | ask | show | jobs
Gocker: Docker implemented in 1.3k lines of Go (unixism.net)
186 points by shuss 2191 days ago
12 comments

My 2018 joke has been implemented.

> Or Gocker, an implementation of Docker in go...

https://news.ycombinator.com/item?id=16119842

As a kid I was a compulsive liar and would make up non-existent stories.

With the advent of search engines I discovered that any story I can come up with actually happened somewhere in the world and has an article about it already.

Sounds like a Gettier problem / Gettier case [0]

Example: Fake barn county. You're driving in a county and see lots of barns. You point to one and say "that's a barn". Unbeknownst to you, almost all the barns are not barns but are facades. By accident you actually are pointing at one of the few real barns.

[0] https://en.wikipedia.org/wiki/Gettier_problem

Hence, everyone is wrong and right at the same time, citation needed of course.
And the original implementation of Docker was only 1,146 LOC [0]

[0] https://github.com/moby/moby/commit/a27b4b8cb8e838d03a99b6d2...

Time for a Docker implementation in D.
Lol. Should link the root though, plenty of suggestions for others: https://news.ycombinator.com/item?id=16117172
And no one thought of Focker written in F#?
Is It too soon to talk about the C implementation?
gotta name the release candidate "Joe"....
Yes, of course, the same day :-)

https://news.ycombinator.com/item?id=16119712

Here's one in 100 lines of bash called, what else, "bocker": https://github.com/p8952/bocker
Now someone make one for fortran
I have nipples, can you milk me?
ASK HN: Help me explain to my wife why I'm doubled over laughing about a Fortran/Lactation joke.
Could someone explain the joke to me?
This is the perfect comment :)
Or a C implementation for the same reason...
Or Rust, "rocker" would make a great name.
So you mean Bash is a high level language compared to Go? /s
I was expecting 'shocker.'
Pretty cool, and very educational for people not familiar with how these things work

That said (and I've said this before), this is not really Docker. It's running containers, not the same thing. If you want to compare it to anything, it's runc, but that's not a good headline :)

Not that the docker architecture is that clean, but it is the combination of ideas it brought to the table what made docker docker:

- have APIs to do everything, from launching workloads to building images. - combining layered filesystems with os-level namespaces - package format for "images" coupled with a distribution system

If it was just starting containers, that was already possible for a good while (and many shared hosting providers already did this).

I think many people underestimate the importance of the first point, having an API to do all this. It's having this combination of ideas that democratised cloud computing, it is what makes the bigger picture possible. While the Docker API is currently not very important anymore, it showed the possibilities, and made the limitations it had also very apparent, and at the beginning, nobody had solutions for this. It took things like Mesos and kubernetes to take it to the next level, with the latter having become the de-facto standard container API.

> If you want to compare it to anything, it's runc, but that's not a good headline :)

I don’t know, gunc is a pretty good name.

Honestly, I don't like lines of code as a metric for anything over the novelty amount of 1. And even then, that's usually some demonic Python list comprehension code.

That being said, the code here is pretty approachable and they weren't noticeably trying to cram it into fewer lines. Like the library, not this marketing.

LoC makes a good bandpass filter. Too few and the code is usually hard to understand, too many and the code is unnecessarily bloated. Once you've been coding in a language for a while you get a sense of what's 'just right' for a complex feature (assuming you understand the feature properly).

What really bugs me about LoC as a metric though is when people don't count libraries. "I made a 3D engine in 10 lines" is just a flat out lie if one of those lines is '#include "Unreal/Engine'.

While I tend to agree, I think LoC can give a sense of scale for large systems and at least a tiny bit of insight about its potential complexity.

I recently worked with a client on an integration effort that had to touch many different points on a (massive, for me) codebase with tens of millions of LoC. For that, LoC was the only reasonable metric I could come up with to try and convey the scale and complexity of the task at hand--being quite ignorant of the system's (and subsystems') architecture(s) at the time.

That was further complicated by the way the massive codebase supported all sorts of dynamic compositions and certain interactions needed to work with baseline compositions expressed in a form of markup totaling about about 5 times the amount of the actual code base, amongst other things.

These folks thought the integration could be done for $80k tops and in a couple months. It took LoC metrics to get them to understand the potential complexity at hand and that a lot more time needed to be spent in assessment and design before jumping in.

Hopefully you’ll forgive this potentially obvious question- is LoC (still?) generally accepted to be bounded ar 80 col?
In some circles yes. Linus recently weighed in against the 80 limit.

http://lkml.iu.edu/hypermail/linux/kernel/2005.3/08168.html

I assume you probably can't share the client, but can you share the general domain of the software?
General domain is modeling and simulation
Thanks! That's interesting!
Isn’t Docker implemented in Go already?
...and it was probably around 1.5K lines early in its life.

But that misses the point. This is very helpful for understanding how a Docker-like system works since it's a small and mostly self-contained implementation. You can read through the entire thing and fully understand it.

It only cheats in the container registry handling where it pulls in github.com/google/go-containerregistry and for the network setup where it uses github.com/vishvananda/netlink. The rest is done in terms of Go stdlib and syscalls.

Early Docker used external utilities (lxc, iptables) and had the client/server stuff already so it's not as straightforward.

Yep. Docker is famously written Go. The only reason I wrote this is as an educational tool to discuss how containers under Linux really work. It’s self-contained (pun unintended) except for the usage of the go-containerregistry package, which anyway does stuff unrelated to how containers are managed on Linux.
Doesn't matter, rewrite it and make the name a gortmanteau.
gortmanteau

Sir we need you to come with us.

take your upvote and get out :))
I think you mean, take your upvote and go. ;)
take your upvote, check err and go.
Yes. TIL.

> What Programming Language Does Docker Use? Docker is written in the Google Go (golang) programming language. To learn why Go was used, we’ll refer you directly to Google.[1]

[1] https://blog.stoneriverelearning.com/docker-101-what-is-dock...

Yeah both Docker and Podman are written in Go
The HN title is not great. It's not docker, it's a mini-docker as the original title says. For example there is no "gocker build" command.

I wonder if that can run in unprivileged docker.

Check out this Liz Rice talk on implementing container tech from scratch. Very clear. https://www.youtube.com/watch?v=8fi7uSYlOdc
Can someone explain the value/purpose of docker to someone who (easily) deploys regular apps to a Digital Ocean droplet?
Easily.

* A Docker image normally contains all the dependencies of a program, or a set of programs. You can run libraries and other software of whatever versions, not necessarily available on your host system; they are already baked into the image. Usually a Docker image only needs a compatible kernel (this is a very lax restriction). It is a damn easy way to distribute software, especially such software which is not trivial to install and set up: tired of wrangling with Grafana installation? just pick a container from their site. And of course you can mount whatever you need inside the container when you need to, so it has controlled access to your filesystem(s).

* A Docker image normally runs with its own firewall. That is, you explicitly say which IPs / ports are available ("exposed") from the container; everything else is blocked. This helps isolate containers from the internet and from one another, and also helps build private networks between containers not exposed outside the host machine. Since containers already talk to each other via a network, it becomes easy to distribute them across many machines when you need to scale.

* A Docker image is built out of layers, and they can share layers. If you are reasonable enough to put common stuff to the bottom layers, then you can have multiple containers with a lot of common software installed inside (like a Node runtime, a JVM, etc) which stored on the host system only once.

* Docker images / containers are the standard for many cloud management systems. AWS can run containers directly. K8s operates on containers. Docker itself offers a simple but rather reasonable orchestration tool called docker-compose. It's great for small deployments and for things like running your setup locally, for development and integration testing.

Containers are not always better for everything you can think of. But they solve a number of common problems; some of these problems might be ones you'd like to have solved, some not.

The biggest advantages though is having to explicitly define al the edges:

- You need persistent storage? You better define it or you'll lose it one the next (re)deploy. - You need to expose network services? Tell me which-ones or it won't work.

If we're talking on small scale single server deploys, it makes backup, upgrade/rollback and migration of applications a LOT easier.

As long as you're not talking about a k8s cluster - which you should avoid with application architectures that aren't "cloud native", I assume you'll have a local docker-compose file which you just start/stop to bring the entire application stack you need (database/app/proxy server/monitoring/...) with one command, which means all external service dependencies are also contained in one 'stack'.

What I also use it for on small-scale apps is having a test environment of the same software running on the same droplet. I just put a Traefik reverse proxy in front of it that autodetects the docker containers, handles HTTPS/ACME certificates and routes the test-url to the test-containers, the real URL to the "production" containers, and they're all isolated.

The joke that I think actually explains it pretty well is that it eliminates the "well it works on _my_ machine" problem, by not just shipping the code, but shipping the machine.
It's also a great way to make sure your build artifacts are owned by root so you don't accidentally delete them.
*shipping the userland

It's not the equivalent of handing over a VM image. You're still open to unintended sensitivity to kernel versions, for instance.

I know that this joke isn't exactly what's happening. I just feel like it's a good way to explain the _idea_ of Docker to people who have no idea what it is.
Reproducibility is the biggest value in my opinion. A Dockerfile encapsulates all the messy dependencies in a single isolated environment. This also makes deployments easier too.
I would say that's portability rather than reproducibility. Docker increases the extent of, but doesn't guarantee, reproducibility.
In which case does it not guarantee reproducibility?
I can answer this one. Sometimes you have lines like this:

    FROM ubuntu:focal
    RUN apt-get -y install libssl-dev
    <your app details>
Since libssl-dev gets periodically updated (security updates and whatnot) if you build this now and build it again in 1 year you're very probably not going to get the same OpenSSL version. So it MIGHT be reproducible, but can easily give you different results depending on updates to the packages and the way your Dockerfile imports external dependencies. And that's before we even mention updates to the base container image.

Of course, you can refer to a specific container image id and pin all your packages, which would go a long way to improving reproducibility.

So it's wrong (or rather uninformed) usage of Docker that leads to this, the tech itself is sound and does guarantee reproducibility.
It's for reifying brittle engineering - if you're doing well without it, you're already at a level above it.
Yeah, I've had a hard time justifying using docker yet too, but hopefully I find a compelling case
Has anyone made an "X implemented in N lines of Y" site yet?
Not exactly the same, but there's [1] and [2].

1: http://aosabook.org/

2: https://github.com/danistefanovic/build-your-own-x

500 Lines or Less is a great book in that genre.

https://github.com/aosabook/500lines/

Rosettacode is a bit like that. Youd have to do your own line count, though.
How many lines of code can you make it in?
> How many lines of code can you make it in?

xinylines.io - implemented in zero lines of code, and one large shameful html file.

This talk by Bryan Cantrill on the history of OS-level virtualization is great.

https://youtu.be/hgN8pCMLI2U

Apparently the first version of Jails was a few hundred lines of code.

Very nice! I love projects like this that return to first principles, and rebuild the core without the cruft. It is refreshing.

The dependency on netlink adds a little to the code weight. Some of this also feels like it could just be a shell script (sh, unlike Go, ships with built in Linux support for netlink, and sticks with dotted-quad-string types for IP addresses instead of mixing with int32!)

I did not realize cgroups were this simple to manipulate. Thank you for the enlightenment.

TBH, all network utilities I ever saw accept the int32 form of IP addresses. E.g. ping 127.0.0.1 can also be written as

  ping 0x7f000001
Try it; it works.
URLs too, though Chrome makes them canonical:

http://0xacd91124/ -- Google

I did a quick search for whether "fun" Hex strings are reachable, but didn't find any (e.g. the canonical http://0xcafebabe). Random combination of hex-words are login pages to web cameras or cable modems. I didn't try to e.g. replace "e" with "3", e.g. http://0xcaf3babe/

You can also just provide an integer. That works too.
Cool educational project!

runc - which is the low-component that does the actual container launching in Docker and other runtimes - is mostly written in Go and quite approachable[1], if you're curious what a production-ready container runtime looks like.

Namespaces look simple on the surface, but there are plenty of subleties, particularly when using Go:

- `runtime.LockOSThread()` has to be called before entering a namespace to pin the goroutine to a specific OS thread. The unshare call affects only the current thread[2][3]. Even then, you have to be careful not to spawn any new goroutines[4]. For this reason, parts of runc are currently written in C (you could technically implement it in pure Go, but the maintainers believe it's easier to reason about the C implementation).

- The container runtime has to reexec itself from a copy of itself in a memfd to prevent the container from writing to /proc/self/exe[5][6].

- Various race conditions and symlink attacks during container setup[7][8].

- Some parts of the container initialization have to be done after switching to the new rootfs, which is attacker-controlled territory[9][10].

- ... and plenty of other gotchas, the runc code is full of comments that explain why things have to be done in particular ways.

Obviously, Gocker is an experiment and does none of these things, and you shouldn't run it on anything that you care about :) Sometimes things are complex for a reason.

[1]: https://github.com/opencontainers/runc

[2]: https://golang.org/doc/go1.10#runtime

[3]: https://github.com/golang/go/issues/20676

[4]: https://www.weave.works/blog/linux-namespaces-golang-followu...

[5]: https://github.com/opencontainers/runc/pull/1984

[6]: https://github.com/opencontainers/runc/commit/0a8e4117e7f715...

[7]: https://github.com/opencontainers/runc/issues?q=race+conditi...

[8]: https://github.com/cyphar/filepath-securejoin

[9]: https://github.com/opencontainers/runc/pull/2207

[10]: https://github.com/opencontainers/runc/issues/2128