Hacker News new | ask | show | jobs
by reacharavindh 3120 days ago
> Are you saying that security and isolation is not enticing for ephemeral services?

Didn't mean to imply the reverse logic of my statement. I believe Linux Containers (and hence Docker) depend only on Kernel namespaces to provide isolation. In my admittedly naive eyes, they were not good enough/mature to replace my KVM VMs yet. Too much to trade off for little convenience/performance.

However, if Linux containers matured up and offered the same isolation facilities that something like KVM does, then I can think about switching to them in future, and enjoy the performance boost.

>I know that an ephemeral container is reset after a restart but I think that it's a bit naive to think that that is a good enough replacement for true isolation.

If I'm looking to run an application for which I care about solid isolation of resources, I'd spend my time running it as VM. But, if I'm running a one-time script that chews some data and I don't care much about it bothering other workloads in the system or other workloads bothering it, then I'd fall back on the isolation facilities offered by namespaces by using Containers. Nothing wrong with that.

Security view on these is another argument. If I can't afford the application escalating it's view and looking into other workloads in that system, I just wouldn't run them in Containers today.

1 comments

Sounds like you are looking for something closer to LXD or perhaps Rkt.
But, AFAIK, LXD and RKt are similar to Docker as a container runtime though. They all share the host kernel, and if one container is hosed/tainted, your host kernel becomes the attack vector. If I read correctly, hypercontainer/kata containers lets you bring your own kernel for your containers and isolates it from the host using intel hardware features(same ones that KVM leverages). That's where it gets interesting to me.
Kata Containers uses KVM; QEMU, which is the userspace KVM client, is configured so that it looks like you are running on a container.

However, what you get is indeed a virtual machine. It is simply impossible for "real" containers to provide the same isolation as virtual machine, simply because the attack surface is that of the shared kernel; a hypervisor presents a much more constrained interface to a VM than the full kernel, even if you add QEMU to the mix.

Silly question: if the KVM is using para-virtualized drivers and there is a vulnerability in same, then the host kernel would still be vulnerable?
Many of KVM's paravirtualized drivers run in QEMU, and in turn that is usually running heavily confined, for example using SELinux.

So it's true that, as in the famous Theo de Raadt rant, virtualization overall adds to the attack surface compared to containers. But it would also be stupid to ignore that it also introduces very important bottlenecks: to get to the hypervisor, you have to break KVM which is only ~60000 lines of code; getting remote code execution in QEMU might be easier, but then you also have to break the host kernel from a process that has access to almost nothing on the system.

There is also vhost, which is implementing PV drivers in the host kernel. This however is also a small amount of code, and it is generally used only for networking and AF_VSOCK sockets.