Hacker News new | ask | show | jobs
by compsciphd 699 days ago
I'd note I didn't say you copied it, just that I created it first (i.e. "compare paper to docker". also, as you note, its possible someone else did it too, but at least my conception got through academic peer-review / patent office, yeah, there's a patent, never been attempted to be enforced though to my knowledge).

when I describe my work (I actually should have used quotes here), I generally give air quotes when saying it, or say "proto docker", as it provides context for what I did (there's also a lot of people who view docker as synonymous with containerization as a whole, and I say that containers existed way before me). I generally try to approach it humbly, but I am proud that I predicted and built what the industry seemingly needed (or at least is heavily using).

people have asked me why I didn't pursue it as a company, and my answer is a) I'm not much of an entrepreneur (main answer), and b) I felt it was a feature, not a "product", and would therefore only really profitable for those that had a product that could use it as a feature (which one could argue that product turned out to be clouds, i.e. they are the ones really making money off this feature). or as someone once said a feature isn't necessarily a product and a product isn't necessarily a company.

1 comments

I understood your point. I wanted to clarify, and in some ways connect with you.

At the time, I didn't know what I was doing. Maybe my colleagues did some more, but I doubt that. I just wanted to stop waking up at night because our crappy container management code was broken again. The most brittle part was the lifecycle of containers (and their filesystem). I recall being very adamant about the layered filesystem, because it allowed to share storage and RAM across running (containerized) processes. This saves in pure storage and RAM usage, but also in CPU time, because the same code (like the libc for example) is cached across all processes. Of course this only works if you have a lot of common layers. But I remember at the time, it made for very noticeable savings. Anyways, fun tidbits.

I wonder how much faster/better it would have been if inspired by your academic research. Or maybe not knowing anything made it so we solved the problems at hand in order. I don't know. I left the company shortly after. They renamed to Docker, and made it what it is today.

I like to say that Docker wouldn’t exist if the Python packaging and dependency management system weren’t complete garbage. You can draw a straight line from “run Python” to dotCloud to Docker.

Does that jive with your experience/memory at all? How much of your motivation for writing Docker could have been avoided if there were a sane way to compile a Python application into a single binary?

It’s funny, this era of dotCloud type IaaS providers kind of disappeared for a while, only to be semi-revived by the likes of Vercel (who, incidentally, moved away from a generic platform for running containers, in favor of focusing on one specific language runtime). But its legacy is containerization. And it’s kind of hard to imagine the world without containers now (for better or worse).

I do not think the mess of dependency management in Python got us to Docker/containers. Rather Docker/containers standardized deploying applications to production. Which brings reproducibility without having to solve dependency management.

Long answer with context follows.

I was focused on container allocation and lifecycle. So my experience, recollection, and understanding of what we were doing is biased with this in mind.

dotCloud was offering a cheaper alternative to virtual machines. We started with pretty much full Linux distributions in the containers. I think some still had a /boot with the unused Linux kernel in there.

I came to the job with some experience testing deploying Linux at scale quickly by preparing images with chroot before making a tarball to then distribute over the network (via multicast from a seed machine) with a quick grub update. This was for quickly installing Counter-Strike servers for tournament in Europe. In those days it was one machine per game server. I was also used to run those tarball as virtual machines for throughout testing. To save storage space on my laptop at the time, I would hard-link together all the common files across my various chroot directories. I would only tarball to ship it out.

It turned out my counter-strike tarballs from 2008 would run fine as containers in 2011.

The main competition was Heroku. They did not use containers at the beginning. And they focused on running one language stack very well. It was Ruby and a database I forget.

At dotCloud we could run anything. And we wanted to be known serving everything. All your languages, not just one. So early on we started offering base images ready made for specific languages and database. It was so much work to support. We had a few base images per team member to maintain, while still trying to develop the platform.

The layered filesystem was to pack ressources more efficiently on our servers. We definitely liked that it saved build time on our laptop when testing (we still had small and slow spinning disks in 2011).

So I wouldn't say that Docker wouldn't exist without the mess of dependency management in software. It just happened to offer a standardized interface between application developers, and the person running it in production (devops/sre).

The fact you could run the container on your local (Linux) machine was great for testing. Then people realized they could work around dependency hell and non reproducible development environment by using containers.

they did it "simpler", i.e. academic work has to be "perfect" in a way a product does not. so (from my perspective), they punted the entire concept of making what I would refer to as a "layer aware linux distribution" and just created layers "on demand" (via RUN syntax of dockerfiles).

From an academic perspective, its "terrible", so much duplicate layers out in the world, from a practical perspective of delivering a product, it makes a lot of sense.

It's also simpler from the fact that I was trying to make it work for both what I call "persistent" containers (ala pets in the terminology) that could be upgraded in place and "ephemeral" containers (ala cattle) when in practice the work to enable upgrading in place (replacing layers on demand) to upgrade "persistent" containers I'm not sure is that useful (its technologically interesting, but that's different than useful).

My argument for this was that this actually improves runtime upgrading of systems. With dpkg/rpm, if you upgrade libc, your systems is actually temporarily in a state where it can't run any applications (in the delta of time when the old libc .so is deleted and the new one is created in its place, or completely overwrites it), any program that attempts to run in that (very) short period time, will fail (due to libc not really existing). By having a mechanism where layers could be swapped in essentially an atomic manner, no delete / overwrite of files occurs and therefore there is zero time when programs won't run.

In practice, the fact that a real world product came out with a very similar design/implementation makes me feel validated (i.e. a lot of phd work is one offs, never to see the light of day after the papers for it are published).

> so (from my perspective), they punted the entire concept of making what I would refer to as a "layer aware linux distribution"

Would you consider there to be any 'layer-aware Linux distributions' today, e.g., NixOS, GuixSD, rpm-ostree-based distros like Fedora CoreOS, or distri?

> so much duplicate layers out in the world

Have you seen this, which lets existing container systems understand a Linux package manager's packages as individual layers?

https://github.com/pdtpartners/nix-snapshotter

(Not GP.)

NixOS can share its Nix store with child (systemd-nspawn) containers. That is, if you go all in, package everything using Nix, and then carefully ensure you don’t have differing (transitive build- or run-time) dependency versions anywhere, those dependencies will be shared to the maximum extent possible. The amount of sharing you actually get matches the effort you put into making your containers use the same dependency versions. No “layers”, but still close what you’re getting at, I think.

On the other hand, Nixpkgs (which NixOS is built on top of) doesn’t really follow a discipline of minimizing package sizes to the extent that, say, Alpine does. You fairly often find documentation and development components living together with the runtime ones, especially for less popular software. (The watchword here is “closure size”, as in the size of a package and all of its transitive runtime dependencies.)

> On the other hand, Nixpkgs (which NixOS is built on top of) doesn’t really follow a discipline of minimizing package sizes to the extent that, say, Alpine does. You fairly often find documentation and development components living together with the runtime ones, especially for less popular software. (The watchword here is “closure size”, as in the size of a package and all of its transitive runtime dependencies.)

Yep. I remember before Nix even had multi-output derivations! I once broke some packages trying to reduce closure sizes when that feature got added, too. :(

Besides continuing to split off more dev and doc outputs, it'd be cool if somehow Nixpkgs had a `pkgsForAnts` just like it has a `pkgsStatic`, where packages just disable more features and integrations. On the other hand, by the time you're really optimizing your Nix container builds it's probably well worth it to use overrides and build from source anyway, binary cache be damned.

I'll try to get back to this to give a proper response, but can't promise.