Hacker News new | ask | show | jobs
by necovek 44 days ago
You are obviously right that these are similar in principle: VM isolation exploit would lead to the same exposure like container-related isolation exploits.

VMs are considered vastly better because the surface area where exploits can happen is smaller and/or better isolated within the kernel.

If you are arguing the latter is not true — and we are all collectively hand-waving away big chunk of the surface area so that may be the case — it would help to be explicit in why you believe an exploit in that area is similarly likely?

1 comments

I would say it's the fact that "not a security boundary" appears to be a pass/fail statement, whereas the reality is more like a security continuum, along which VMs are further than containers.
I believe that is tautologically true, and thus not a very useful framing.

Security is obviously a continuum (eg. you can even have a bug in your IPMI FW, and a network packet could break in without any interaction with the OS; or there could be a HW bug too), but there is a discrete "jump" between containers and VMs to the extent that it is useful to call one a security boundary and the other not. Just like a firewall is a security boundary even if it can have security bugs.

Whether this jump between exploitable surface area warrants this distinction is what the point is: many believe it does.

But you also cannot just handwave the difference by "it's a continuum". I did not use absolutes, but said "VMs are _better_ for security", which already implicit about a "continuum".

Containers are mostly used as a deployment/packaging model where typically VMs are used where stronger security is needed. This has been the established industry standard for a while. Look at major cloud providers for example.

AWS:

> Unless explicitly stated, AWS does not consider a container or primitives such as an ECS task or a Kubernetes pod to be a security boundary. A notable exception to this is ECS tasks running AWS Fargate, where the isolation boundary is a task. To account for this, we recommend that you use Fargate with ECS if your applications have strict isolation requirements.

> When you’re using the Fargate launch type, each Fargate task has its own isolation boundary and does not share the underlying kernel, CPU resources, memory resources, or elastic network interface with another task.

They also further recommend that for even higher security requirements use different EC2 instances - which you can also run on dedicated hardware etc. But the fact that you can further increase isolation beyond VMs, does not make containers the same as VMs.

https://aws.amazon.com/blogs/security/security-consideration...

GCP:

> There’s one myth worth clearing up: containers do not provide an impermeable security boundary, nor do they aim to. They provide some restrictions on access to shared resources on a host, but they don’t necessarily prevent a malicious attacker from circumventing these restrictions. Although both containers and VMs encapsulate an application, the container is a boundary for the application, but the VM is a boundary for the application and its resources, including resource allocation.

> If you're running an untrusted workload on Kubernetes Engine and need a strong security boundary, you should fall back on the isolation provided by the Google Cloud Platform project. For workloads sharing the same level of trust, you may get by with multi-tenancy, where a container is run on the same node as other containers or another node in the same cluster.

https://cloud.google.com/blog/products/gcp/exploring-contain...

> Applications that run in traditional Linux containers access system resources in the same way that regular (non-containerized) applications do: by making system calls directly to the host kernel.

> One approach to improve container isolation is to run each container in its own virtual machine (VM). This gives each container its own "machine," including kernel and virtualized devices, completely separate from the host. Even if there is a vulnerability in the guest, the hypervisor still isolates the host, as well as other applications/containers running on the host.

> gVisor is more lightweight than a VM while maintaining a similar level of isolation. The core of gVisor is a kernel that runs as a normal, unprivileged process that supports most Linux system calls. This kernel is written in Go, which was chosen for its memory- and type-safety. Just like within a VM, an application running in a gVisor sandbox gets its own kernel and set of virtualized devices, distinct from the host and other sandboxes.

https://cloud.google.com/blog/products/identity-security/ope...

These guys are experts when it comes to securing workloads on shared infra and while there are different levels of isolation using various techniques, the current industry practice is to not consider regular Linux containers a security boundary.