Do some containers run outside a VM? Docker for example "uses operating-system-level virtualization to develop and deliver software in packages called containers."
I'd personally blame marketing-speak on using "virtualization" at all (unless they refer to their windows/mac offerings, which can run a Linux VM as the docker host, on which the containers are run), but I can see how one could also stretch a definition of virtualization in a way that covers container.
Sometimes containers are run in VMs, but they are almost defined as "do not require a full VM running an OS, but instead talk to the host kernel".
One could argue that the key of virtualization is that a piece of software is run in an environment that pretends to be something else than the actual base system. A VM hypervisor runs an operating system in a way that it looks like as it is running alone on a physical machine, with some fake devices. From inside a container, similarly the environment is fake: it can't see processes outside the container, it's view of the file system or devices is modified, and it looks as if the things in the container were the only things on that kernel.
So at its core it's just a set of access permissions + hiding of "forbidden" stuff? How about RAM and stuff, and hardware - does it get a true answer if querying its system? Or is that stuff virtualized too?
Super late but I have an comment[0] that answers this relatively decently, particularly this sentence:
> A docker container is not a VM, it is a regular process, isolated with the use of cgroups and namespaces, possibly protected (like any other process) with selinux/apparmor/etc.
Where virtual machines will actually virtualize a whole machine (down to having BIOS for your imaginary motherboard and a CPU for this imaginary machine), linux containerization virtualizes the resources & environment available to a single running process via the use of namespaces (pid, user, etc) and cgroups (available cpu, memory, etc).
So basically, there's a bunch of code in the kernel (shared between all containers) that enables the accurate reporting of all the "virtualized" resources/environment (cpu, memory, other pids running) -- that code can be exploited, which would be a "container escape". Dirty Cow[1] is an example of one of these escapes.
thanks, this was super useful. I thought all docker containers were VM's that were one level less virtualized or something, but still essentially a VM. (So, I thought that docker containers saw a virtual box with a virtual bios, fake ram size, etc etc). thanks for clearing this up for me!
But per the other reply, containers are a lot less "contained" than VM's? I.e. if a program wants to list its set of processes, the host could fuck up and show them some from outside its container - whereas for the same thing to happen by a VM, it would have to have code to read that outside stuff, functionality it might not even contain... so vm's seem safer than containers... is that right?
Yep, VMs are safer than containers, because there is a larger barrier between the possibly malicious code running inside the VM than there is in the container context. A container is just another process, bound by limitations via namespaces and cgroups -- running on a shared kernel as a host. But don't take my word for it:
> Simply put, containers are just processes, and as such they are governed by the kernel like any other process. Thus any kernel-land vulnerability which yields arbitrary code execution can be exploited to escape a container. To demonstrate this, Capsule8 Labs has created an exploit that removes the process from its confines and gives it root access in the Real World. Let’s take a look at what was involved.
(I don't know much about capsule8 as a company is but that article[0] is pretty informative and seems spot on from what I read)
If you can infiltrate a process (let's say a web server) running in a container and know a kernel exploit that can be used to get past these limitations (a "container escape"), then you can use them and get root on the main system.
If that same process was running in a VM (without a container), you need to:
- Infiltrate the process
- Kernel exploit to gain root (assuming the program wasn't running under it) in the VM
- Escape the VM (i.e. use the kernel or whatever else to actually break past the barriers of the hypervisor which was running the vm -- qemu +/- kvm, hyperv,etc) -- aka a "virtual machine escape"[1]
- Gain root on the host system (assuming the process that spawned the hypervisor wasn't running as root)
Generally, virtual machine security is pretty good these days, by virtue of being around longer and having more exposure and eyes looking for exploits.
I'm not sure that is true. I suspect that a great many containers are running in OSs that are in turn running in VMs on hosts in "cloud" structures, perhaps eclipsing the number that are running on an OS on bare metal.
I'd personally blame marketing-speak on using "virtualization" at all (unless they refer to their windows/mac offerings, which can run a Linux VM as the docker host, on which the containers are run), but I can see how one could also stretch a definition of virtualization in a way that covers container.
Sometimes containers are run in VMs, but they are almost defined as "do not require a full VM running an OS, but instead talk to the host kernel".