Hacker News new | ask | show | jobs
by bombela 699 days ago
> As the person who created docker (well, before docker - see https://www.usenix.org/legacy/events/atc10/tech/full_papers/... and compare to docker)

I picked the name and wrote the first prototype (python2) of Docker in 2012. I had not read your document (dated 2010). I didn't really read English that well at the time, I probably wouldn't have been able to understand it anyways.

https://en.wikipedia.org/wiki/Multiple_discovery

More details for the curious: I wrote the design doc and implemented the prototype. But not in a vacuum. It was a lot work with Andrea, Jérôme and Gabriel. Ultimately, we all liked the name Docker. The prototype already had the notion of layers, lifetime management of containers and other fundamentals. It exposed an API (over TCP with zerorpc). We were working on container orchestration, and we needed a daemon to manage the life cycle of containers on every machine.

2 comments

I'd note I didn't say you copied it, just that I created it first (i.e. "compare paper to docker". also, as you note, its possible someone else did it too, but at least my conception got through academic peer-review / patent office, yeah, there's a patent, never been attempted to be enforced though to my knowledge).

when I describe my work (I actually should have used quotes here), I generally give air quotes when saying it, or say "proto docker", as it provides context for what I did (there's also a lot of people who view docker as synonymous with containerization as a whole, and I say that containers existed way before me). I generally try to approach it humbly, but I am proud that I predicted and built what the industry seemingly needed (or at least is heavily using).

people have asked me why I didn't pursue it as a company, and my answer is a) I'm not much of an entrepreneur (main answer), and b) I felt it was a feature, not a "product", and would therefore only really profitable for those that had a product that could use it as a feature (which one could argue that product turned out to be clouds, i.e. they are the ones really making money off this feature). or as someone once said a feature isn't necessarily a product and a product isn't necessarily a company.

I understood your point. I wanted to clarify, and in some ways connect with you.

At the time, I didn't know what I was doing. Maybe my colleagues did some more, but I doubt that. I just wanted to stop waking up at night because our crappy container management code was broken again. The most brittle part was the lifecycle of containers (and their filesystem). I recall being very adamant about the layered filesystem, because it allowed to share storage and RAM across running (containerized) processes. This saves in pure storage and RAM usage, but also in CPU time, because the same code (like the libc for example) is cached across all processes. Of course this only works if you have a lot of common layers. But I remember at the time, it made for very noticeable savings. Anyways, fun tidbits.

I wonder how much faster/better it would have been if inspired by your academic research. Or maybe not knowing anything made it so we solved the problems at hand in order. I don't know. I left the company shortly after. They renamed to Docker, and made it what it is today.

I like to say that Docker wouldn’t exist if the Python packaging and dependency management system weren’t complete garbage. You can draw a straight line from “run Python” to dotCloud to Docker.

Does that jive with your experience/memory at all? How much of your motivation for writing Docker could have been avoided if there were a sane way to compile a Python application into a single binary?

It’s funny, this era of dotCloud type IaaS providers kind of disappeared for a while, only to be semi-revived by the likes of Vercel (who, incidentally, moved away from a generic platform for running containers, in favor of focusing on one specific language runtime). But its legacy is containerization. And it’s kind of hard to imagine the world without containers now (for better or worse).

I do not think the mess of dependency management in Python got us to Docker/containers. Rather Docker/containers standardized deploying applications to production. Which brings reproducibility without having to solve dependency management.

Long answer with context follows.

I was focused on container allocation and lifecycle. So my experience, recollection, and understanding of what we were doing is biased with this in mind.

dotCloud was offering a cheaper alternative to virtual machines. We started with pretty much full Linux distributions in the containers. I think some still had a /boot with the unused Linux kernel in there.

I came to the job with some experience testing deploying Linux at scale quickly by preparing images with chroot before making a tarball to then distribute over the network (via multicast from a seed machine) with a quick grub update. This was for quickly installing Counter-Strike servers for tournament in Europe. In those days it was one machine per game server. I was also used to run those tarball as virtual machines for throughout testing. To save storage space on my laptop at the time, I would hard-link together all the common files across my various chroot directories. I would only tarball to ship it out.

It turned out my counter-strike tarballs from 2008 would run fine as containers in 2011.

The main competition was Heroku. They did not use containers at the beginning. And they focused on running one language stack very well. It was Ruby and a database I forget.

At dotCloud we could run anything. And we wanted to be known serving everything. All your languages, not just one. So early on we started offering base images ready made for specific languages and database. It was so much work to support. We had a few base images per team member to maintain, while still trying to develop the platform.

The layered filesystem was to pack ressources more efficiently on our servers. We definitely liked that it saved build time on our laptop when testing (we still had small and slow spinning disks in 2011).

So I wouldn't say that Docker wouldn't exist without the mess of dependency management in software. It just happened to offer a standardized interface between application developers, and the person running it in production (devops/sre).

The fact you could run the container on your local (Linux) machine was great for testing. Then people realized they could work around dependency hell and non reproducible development environment by using containers.

they did it "simpler", i.e. academic work has to be "perfect" in a way a product does not. so (from my perspective), they punted the entire concept of making what I would refer to as a "layer aware linux distribution" and just created layers "on demand" (via RUN syntax of dockerfiles).

From an academic perspective, its "terrible", so much duplicate layers out in the world, from a practical perspective of delivering a product, it makes a lot of sense.

It's also simpler from the fact that I was trying to make it work for both what I call "persistent" containers (ala pets in the terminology) that could be upgraded in place and "ephemeral" containers (ala cattle) when in practice the work to enable upgrading in place (replacing layers on demand) to upgrade "persistent" containers I'm not sure is that useful (its technologically interesting, but that's different than useful).

My argument for this was that this actually improves runtime upgrading of systems. With dpkg/rpm, if you upgrade libc, your systems is actually temporarily in a state where it can't run any applications (in the delta of time when the old libc .so is deleted and the new one is created in its place, or completely overwrites it), any program that attempts to run in that (very) short period time, will fail (due to libc not really existing). By having a mechanism where layers could be swapped in essentially an atomic manner, no delete / overwrite of files occurs and therefore there is zero time when programs won't run.

In practice, the fact that a real world product came out with a very similar design/implementation makes me feel validated (i.e. a lot of phd work is one offs, never to see the light of day after the papers for it are published).

> so (from my perspective), they punted the entire concept of making what I would refer to as a "layer aware linux distribution"

Would you consider there to be any 'layer-aware Linux distributions' today, e.g., NixOS, GuixSD, rpm-ostree-based distros like Fedora CoreOS, or distri?

> so much duplicate layers out in the world

Have you seen this, which lets existing container systems understand a Linux package manager's packages as individual layers?

https://github.com/pdtpartners/nix-snapshotter

(Not GP.)

NixOS can share its Nix store with child (systemd-nspawn) containers. That is, if you go all in, package everything using Nix, and then carefully ensure you don’t have differing (transitive build- or run-time) dependency versions anywhere, those dependencies will be shared to the maximum extent possible. The amount of sharing you actually get matches the effort you put into making your containers use the same dependency versions. No “layers”, but still close what you’re getting at, I think.

On the other hand, Nixpkgs (which NixOS is built on top of) doesn’t really follow a discipline of minimizing package sizes to the extent that, say, Alpine does. You fairly often find documentation and development components living together with the runtime ones, especially for less popular software. (The watchword here is “closure size”, as in the size of a package and all of its transitive runtime dependencies.)

> On the other hand, Nixpkgs (which NixOS is built on top of) doesn’t really follow a discipline of minimizing package sizes to the extent that, say, Alpine does. You fairly often find documentation and development components living together with the runtime ones, especially for less popular software. (The watchword here is “closure size”, as in the size of a package and all of its transitive runtime dependencies.)

Yep. I remember before Nix even had multi-output derivations! I once broke some packages trying to reduce closure sizes when that feature got added, too. :(

Besides continuing to split off more dev and doc outputs, it'd be cool if somehow Nixpkgs had a `pkgsForAnts` just like it has a `pkgsStatic`, where packages just disable more features and integrations. On the other hand, by the time you're really optimizing your Nix container builds it's probably well worth it to use overrides and build from source anyway, binary cache be damned.

I'll try to get back to this to give a proper response, but can't promise.
I’m really confused. Solomon Hykes is typically credited as the creator of Docker. Who are you? Why is he credited if someone else created it?
> Why is he credited if someone else created it?

This is the internet and just about everyone could be diagnosed with Not Invented Here syndrome. First one to get recognition for creating something that's already been created is just a popular meme.

Solomon was the CEO of dotCloud. Sébastien the CTO. I (François-Xavier "bombela") was one of the first software engineer employee. Along with Andrea, Jérôme and Louis with Sam as our manager.

When it became clear that we had reached the limits of our existing buggy code, I pushed hard to get to work on the replacement. After what seemed an eternity pitching Solomon, I was finally allowed to work on it.

I wrote the design doc of Docker with the help of Andrea, Jérôme, Louis and Gabriel. I implemented the prototype in Python. To this day, three of us will still argue who really choose the name Docker. We are very good friends.

Not long after, I left the company. Because I was under paid, I could barely make ends meet at the time. I had to borrow money to see a doctor. I did not mind, it's the start-up life, am I right? I worked 80h/week happy. But then I realized not everybody was under paid. And other companies would pay me more for less work. When I asked, Solomon refused to pay me more, and after being denied three times, I quit. I never got any shares. I couldn't afford to buy the options anyways, and they had delayed the paperwork multiple times, such that I technically quit before the vesting started. I went to Google, were they showered me with cash in comparison. The next morning after my departure from dotCloud, Solomon raised everybody's salary. My friends took me to dinner to celebrate my sacrifice.

I am not privy to all the details after I left. But here is what I know. Andrea rewrote Docker in Go. It was made open source. Solomon even asked me to participate as an external contributor. For free of course. As a gesture to let me add my name to the commit history. Probably the biggest insult I ever received in life.

dotCloud was renamed Docker. The original dotCloud business was sold to a German company for the customers.

I believe Solomon saw the potential of Docker for all, and not merely an internal detail within a distributed system designed to orchestrate containers. My vision was extremely limited, and focused on reducing the suffering of my oncall duties.

A side story: the German company transfered to me the trademark of zerorpc, the open source network library powering dotCloud. I had done a lot of work on it. Solomon refused to hand me off the empty github/zerorpc group he was squatting. He offered to grant me access but retain control. I went for github/0rpc instead. I did not have time nor money to spend on a lawyer.

By this point you might think I have a vendetta against a specific individual. I can assure you that I tried hard to paint things fairly with a flattering light.

He means he came up with the same concept, not literally created Docker.