I don't understand the point of Docker. It seems like a great product. For any serious production grade containerization , I'd use a real virtualization solution like KVM, or VMWare.
It's containerization between trustworthy apps; it's not security containerization. What it gets you is, if you have one application that's designed to run well on RHEL 5 with /usr/bin/python pointing to Python 2.4, and another one that's designed to run well on Debian testing with a manual /usr/bin/python symlink to Python 3, you can give both of them what they want. This has nothing to do with security.
If you want Docker + security isolation, I'm intrigued by Clear Containers, which is a lightweight KVM-based virtualization thing:
If you want to try out Clear Containers we merged it into the rkt container runtime a few weeks ago. And since rkt can run both docker and appc container images you can run any existing app container from this runtime. This blog posts details Clear Containers in rkt:
> It's containerization between trustworthy apps; it's not security containerization.
Isn't that what a process is? The containers in Linux are based on Jails from FreeBSD and zones from Solaris. They are absolutely there for security.
Regarding the remaining part of your post, I understand what you are trying to show but python is a really bad example. You absolutely can have python 2 and 3 side by side, or even different minor versions. And with virtualenv or pyvenv (that came with 3.4) you can even have multiple installation of the sane version. If you add setuptools to your application you can easily generate single file package (I personally like wheel) the deployment is as simple as writing pip install myawesomeapp-1.0.py2.py3.whl it downloads all dependencies. There is not much that Docker would help, it only makes things more complex.
It's the level of isolation of a process, yes. Just as two processes can use their address spaces as they see fit without bothering each other, even loading different versions of the same library, under a Linux container, two applications can use their filesystem as they see fit without bothering each other, even using different versions of the same binary applications.
But the security isolation between two processes running as the same user account is extremely weak. While it's true that one process can't write to another one's memory directly, it's not a fundamental breach of the security policy if it can do so indirectly. There may be things to increase defense-in-depth (like Yama) but fundamentally if you're the same UID there is no security boundary. The same rule applies to containers.
> python is a really bad example
Yeah, agreed. I was just trying to come up with something quick. If your app works with v(irtual)env, by all means just use that and stop messing with containers. However, if you've got some large closed-source app with a portion in Python, and it expects /usr/bin/python to both work and be some exact version, you need to virtualize the filesystem.
> It's the level of isolation of a process, yes. Just as two processes can use their address spaces as they see fit without bothering each other, even loading different versions of the same library, under a Linux container, two applications can use their filesystem as they see fit without bothering each other, even using different versions of the same binary applications.
I'm sorry for disagreeing, but that's what chroot() or even chdir() supposed to do. It's not for security (process can fool it), but they do provide isolation assuming there are no malicious actors.
Containers were created to provide security, perfect example is FreeBSD Jail which precedes Solaris zones. It supposed to be secure version of chroot() which should not be escapable. It was successfully used in early 2000 before VMs to provide shared hosting.
> But the security isolation between two processes running as the same user account is extremely weak. While it's true that one process can't write to another one's memory directly, it's not a fundamental breach of the security policy if it can do so indirectly. There may be things to increase defense-in-depth (like Yama) but fundamentally if you're the same UID there is no security boundary. The same rule applies to containers.
Agreed, except the last sentence. With processes the isolation is weak because same UID represent the same user, if you use a different UID the isolation is enforced. The containers (assuming they are correctly set up) allow you to actually have two root accounts that can't interfere with each other.
> Yeah, agreed. I was just trying to come up with something quick. If your app works with v(irtual)env, by all means just use that and stop messing with containers. However, if you've got some large closed-source app with a portion in Python, and it expects /usr/bin/python to both work and be some exact version, you need to virtualize the filesystem.
Assuming these are not malicious, you can just do:
> The containers (assuming they are correctly set up) allow you to actually have two root accounts that can't interfere with each other.
To the best of my knowledge, Docker (the official implementation) does not do that. rkt does, as mentioned at the bottom of this blog post mentioned elsethread:
(The Linux implementation of this is somewhat poor, in that you need to have a separate UID reserved in the global namespace, and you can only do 1:1 maps in containers. A nicer implementation would treat the user principal as a (container, UID) tuple. I recall that Linux tried that, but gave up for backwards-compatibility reasons.)
> chroot /app1_root python myapp1.py
Yeah, I think 80% of what Docker actually gets people in practice is a system for managing and running things in chroots. Containers also let you give them separate networking setups, track PIDs properly, and apply resource controls. But I've seen homegrown approximations that preceded Docker, based on stuff like schroot.
Pretty much. It is marketing multiple exiting open source Linux technologies (overlay file systems, namespacing for processes, network sockets, chroots) under a set of tools and bam! -- hundreds of millions of dollars in valuation.
Well, more than that, it's packaging them into user-friendly tools and promoting the shit out of it.
Still Docker the company's core value proposition is a hosted registry, something many savvy corporations will never go for. Docker the product could probably do just fine if the company were to fold.
but the hosted registry is what your average distribution already provide under the form of packages.
and with everything moving to services, I see the utility of actually using components diminishing fast (well except for those providing those services)
but for everyone else, docker solves no actual problem that can't already be solved now.
virtualenv only works if you want an isolated Python environment. What if you have two things that each want a set of .so libraries with incompatible versions, and one wants node.js? In theory this could be managed with something like NixOS, but containerization is the much more mature and flexible solution.
In that situation you can use LD_LIBRARY_PATH, that's what it is for.
But what you really want in that case is to link the application statically. If you don't want to have benefits of shared objects:
- smaller binary
- memory savings (if multiple programs are using the same library, it is loaded once)
- less files to patch to fix a security vulnerability
The share objects have these features but it comes at price of lower performance, so by putting all .so files into a single docker file instead of statically compiling your application you're getting worst out of both worlds.
How is that a problem? That's bascially the same thing as installing a newer set of tools under your home directory, and using them for an application. This has been done since long before Linux.
(Containers are still good, for isolating concerns and management. Multiple versions of the same library is just not it.)
Yeah, the UNIX-Haters(tm) view of Docker is something like, this is what you get when you give up on static linking, give up on a UNIX spec, and make the mistake of telling people that chroots exist. We could have avoided all of this, and normal application deployment would have just worked with all the benefits Docker gives.
Containerization and virtualization serve different purposes. VMs run actual operating systems within them. A single operating system runs many different containers, that each act something like processes running on that same OS, in a way where they're highly sandboxed and segmented from each other.
If your goal is strong isolation, then VMs are definitely better today. The purpose of Docker and similar container technologies is not that kind of isolation. It's to package up and distribute applications in a way that's more decoupled than simply installing them all on the same system.
I'm not hiding that after I learned about Docker I became skeptic. It seems like yet another thing that people observed what Google was doing, and then implementing it wrong.
Google is using containers instead of VMs. This still provides security isolation and allows them to use resources more efficiently (VM has overhead where you need a whole OS for every instance).
This approach does not make much sense in public cloud, where you already run inside of VM and the overhead is really for Amazon not you. So I see Docker is now pivoting to be a package manager, but there are already tools that do that. You can argue that Docker is simpler but so was rpm when it started. As Docker will grow it will become more complex in order to support all functionality package format already provides. There might be an argument that you can run multiple Docker containers on a single host, but that's what processes are for.
There is change happening, and looks like cloud companies want to create "cloud os", I guess Docker is step toward that direction, but at current state in don't see it offering anything valuable to the organization that uses it.
Docker as a package manager somewhat matches my use, and I think it does its job well. As the packager, I can just create one "package" that any Linux distribution running Docker can install, instead of creating packages for multiple versions of many different distributions.
I think that container images will replace traditional package management in many cases. Package management in Linux has given us many great things:
1) Easy global mirroring of software using "boring" protocols like http and ftp
2) Cryptographic signing of the software so we can trust mirrors and systems that put users in control of who to trust.
3) Human significant package names that are easy to pull onto a host e.g. `apt-get install $name`
Where package management broke down:
1) Package collisions e.g. If I want to install a new custom build of python it replaces the host version and everything may break. The python3 v python2.6 problem.
2) Dependency namespacing. e.g. If I want rely on a non-official mirror to ship me whizbang project X they could also replace my libc by adding it to the repo because the names and versions collide.
Making sure that we hold onto the good three properties of package management while fixing the two problems is important for Linux moving forward. The last 15 years of Linux was dominated by the centralized package management system and a ton of hacks have developed to work around it when you need a new package or want to install custom software. This is why I spend so much time working on container image specs like appc and oci; I hope we can arrive at a good container image format for the next 15 years that everyone can rely on.
> 1) Package collisions e.g. If I want to install a new custom build of python it replaces the host version and everything may break. The python3 v python2.6 problem.
I'm sorry, but you hit my pet peeve when you used python as example. Python was written in a way that you can have multiple versions installed side by side and they all can work without problem. If you have a python3 package that is uninstalling python2.6 then its author screwed something up and I would be afraid to use it at all.
> 2) Dependency namespacing. e.g. If I want rely on a non-official mirror to ship me whizbang project X they could also replace my libc by adding it to the repo because the names and versions collide.
Is that still an issue? Zypper in OpenSuSE is quite smart and for each package it remember which repo it is from. It tries to satisfy dependencies without changing vendor, and prompts if only way to satisfy dependencies is to change vendor.
That said if an application requires different version of glibc much better would be to compile it statically, but even then glibc supposed to match your kernel version so you're still risking some incompatibilities.
GNU Guix is a package manager that solves those 2 problems while not sacrificing the 3 good properties. It also offers additional nice features like unprivileged package management, transactional upgrades/rollbacks, full system configuration management, and a tool that can be used like a language-agnostic virtualenv. Built-in Linux container integration is also on the way.
Packaging in the language itself ranges from difficult to impossible for most non-trivial apps. Popular runtimes like python, node, and ruby have many extensions and packages that rely on C and share libraries. Because of that a compiled binary package will not be portable between machines or require that the machine you deploying to have a working compiler to build the C package from source. I have seen vast amounts of engineering effort invested into porting C code to slower "native" code just to make packaging work. With containers you can avoid the mess of trying to share the `/` filesystem namespace with the host and concentrate on getting your application working on a chroot that is portable between Linux kernel systems.
I 100% agree with the compile it statically thing. This is what many major internet properties likes Google do and I would argue part of why Go is so popular. But, it is hard for the vast majority of applications that expect to open files for assets and lack build systems for static compilation.
KVM & VMWare are not containerization, they're full virtualization.
There are a lot of benefits to containers and they don't have to be insecure. More efficient resource utilization and orders of magnitude faster allocation and launching to name two.
Google runs a significant portion of its internal operations in a container infrastructure and has for quite a while.[1]
They're perfectly capable of deployment into production environments.
I won't comment on docker as I haven't spent the time to fully grok all its warts.
I know from Joe Beda's talk [1] they run vm's inside containers for scenarios where they need a managed os. And that those containers run on bare metal. But I can't speak to the reverse not being an employee or authority on Google's internals.
Think of it like a tool for packaging and deploying applications with everything they need. Its purpose is closer to package managers than being a secure sandbox for running untrusted users' VMs.
I'd be inclined to agree. The reductionist sum of mechanisms that make a Linux container have always been about detaching, multiplexing and partitioning kernel resource subsystems. Docker was the first program that really hyped it into the idea of being about application deployment, but I fear this gives people wrong impressions and makes the mistake of treating an emergent property as if it were a fundamental.
I have no idea what point you're trying to convey with the second part. That applications are not business logic over kernel resources is a curious argument to make.
that was my initial impression i got from reading some posts on the Docker Site, and perhaps there is no consistent definition of "package manager"--to me though, the most difficult tasks a p/m must do are: reproducible builds, dependency management, conflict resolution among transitive dependencies, and the like. But Docker does none of these things as i understand it.
We[1] believe that the point of Docker is to provide "application packages" (called containers), which is a big step ahead to deliver applications (using their words: build, ship, run).
However, we also do believe the isolation containers provide isn't sufficient for multi-tenant usages. This is the main motivation behind Hyper, which run groups of container images (Pods) as Virtual Machines.
Any virtualization solution is going to require you manage an operating system. One of the goals of contanerization is for developers to only work with the application.
This. Only 2G, not 200M. I try to get people to package application plus dependencies yet this is what they do every time. Every single time.
Plus they always base it on images from the Internet, so bascially we trust some stranger with root privileges to all our data. Not always on the same image of course.
Yes, this is a fundamental issue with Docker and other container systems that work with raw disk images as their basic unit of information. I have implemented a container system for the GNU Guix package manager that doesn't have this image bloat problem because it doesn't use opaque disk images. We store packages in a content-addressable storage system, which allows us to know the precise dependency graph for any piece of software in the system. Since we know the full set of software needed for any container, we are able to share the same binaries amongst all containers on a single host via a bind mount.
The binaries have to run somewhere. Container evangelists love to espouse the purity of running any container on any dock OS, and this is as true as being able to migrate VMs between any host - it still comes down to what the application/VM needs from the underlying OS/hardware.
Mostly it comes down to lack of experience with containers so far, and lack of tools.
Most apps need very little from the underlying OS if you actually take the time to e.g. set up a toolchain with a build container that you then move the build artefacts out of to install into the final container. Instead you see a lot of containers that in effect include all the build dependencies and a nearly full OS pulled in by that.
If your VMs are single-purpose, you don't necessarily need VMs. Containers are single-process things - they're not running syslog or cron or any of that overhead, for example. Docker is also big on ensuring you are using the literal same artifact in dev as on prod (assuming you change your team's workflow, of course).
Which is the right thing to use is entirely dependent on your use case.
> Containers are single-process things - they're not running syslog or cron or any of that overhead, for example.
You're either being entirely too prescriptive or interchanging containers and Docker freely when they are not equivalent, though one is a particular implementation of the other.
People have been running containers that emulate a full system for quite a while (see FreeBSD jails, illumos/Solaris zones, Linux OpenVZ, LXC, etc.).
If you want Docker + security isolation, I'm intrigued by Clear Containers, which is a lightweight KVM-based virtualization thing:
https://lists.clearlinux.org/pipermail/dev/2015-September/00...
https://lwn.net/Articles/644675/