| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by chrisseaton 1536 days ago

Containers are really a kind of process-isolation - you still share a kernel. You can find a lot of people saying that containers aren’t enough for running untrusted user code.

If you run a fully virtualised instance you get your own kernel and aren’t relying on process isolation.

Would you be happy if your cloud provider was running your containers on the same virtual I stance as someone else’s? Most people wouldn’t be.

1 comments

csmpltn 1536 days ago

The only meaningful difference between breaking out of a process-isolated "container" and a full-blown VM is what's waiting for you outside once you've broken out. Whether it's kernel/OS or a bare metal hypervisor isn't really all that meaningful: exploits and vulnerabilities exist for either.

There should be proper hardware-level isolation here, depending on the scenario. Most cloud companies can't afford that though, because they're not rolling out their own hardware.

link

staticassertion 1536 days ago

> Whether it's kernel/OS or a bare metal hypervisor isn't really all that meaningful: exploits and vulnerabilities exist for either.

This is just not true, or at least it's extremely disingenuous.

Container isolation relies on the Linux kernel. Other than seccomp-denied syscalls (which aren't a thing in k8s by default) any program in the container has full access to the kernel. The Linux kernel has massive attack surface, especially to root users.

VM isolation like Firecracker is much safer. The attack surface is considerably lower. For one thing, you can isolate the process in the guest just as well as you could outside, further limiting attack surface. But more importantly, an attacker either has to attack:

1. Firecracker

2. KVM

Both are very small codebases.

Firecracker is:

1. Written in Rust.

2. Sandboxed aggressively.

KVM has basically never had a public guest to host breakout. You can read about one here, https://googleprojectzero.blogspot.com/2021/06/an-epyc-escap...

So, to recap, we have "security boundary relies on a fully exposed Linux kernel" and "security boundary relies on hardened, tiny, security-driven programs".

It is not even close.

> There should be proper hardware-level isolation here, depending on the scenario. Most cloud companies can't afford that though, because they're not rolling out their own hardware.

Hence hardware building hypervisor support in.

link

chrisseaton 1536 days ago

Genuinely, would you be happy with just container isolation between you and other customers of your cloud provider?

Most people absolutely would not.

link

csmpltn 1536 days ago

> "Genuinely, would you be happy with just container isolation between you and other customers of your cloud provider? Most people absolutely would not."

But that's exactly how VPS hosting works today - you don't get your own private blade unless you're ready to pay premium prices and have the competence needed to run them yourself. The technicalities of how private resources in a VPS are isolated from each other will differ, but the concept remains the same nonetheless.

People bite the bullet, only to be subject to things like rowhammer [1], or other container escape scenarios [2].

The top comment in this thread reflects the proper way of dealing with this: containers or sandboxes are may not be treated as a secure boundary.

[1] https://www.usenix.org/conference/usenixsecurity16/technical...

[2] https://www.intezer.com/blog/research/how-we-escaped-docker-...

link

detaro 1535 days ago

No, VPS hosting is not usually container-based today once you leave the utter bargain-bin offers. The difference between VM isolation and container isolation is quite significant.

link

chrisseaton 1535 days ago

> But that's exactly how VPS hosting works today

No, VPS is isolation by virtualisation, not containerisation.

The clue is in the V in the name.

link