| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chrisseaton 1535 days ago
	I don't think this is the bottomless pit that you think it is. A virtualised instance is a lot more secure than a container, and it's probably fine to stop at virtualised instances.

1 comments

csmpltn 1535 days ago

A lot more secure? In what ways?

link

chrisseaton 1535 days ago

Containers are really a kind of process-isolation - you still share a kernel. You can find a lot of people saying that containers aren’t enough for running untrusted user code.

If you run a fully virtualised instance you get your own kernel and aren’t relying on process isolation.

Would you be happy if your cloud provider was running your containers on the same virtual I stance as someone else’s? Most people wouldn’t be.

link

csmpltn 1535 days ago

The only meaningful difference between breaking out of a process-isolated "container" and a full-blown VM is what's waiting for you outside once you've broken out. Whether it's kernel/OS or a bare metal hypervisor isn't really all that meaningful: exploits and vulnerabilities exist for either.

There should be proper hardware-level isolation here, depending on the scenario. Most cloud companies can't afford that though, because they're not rolling out their own hardware.

link

staticassertion 1535 days ago

> Whether it's kernel/OS or a bare metal hypervisor isn't really all that meaningful: exploits and vulnerabilities exist for either.

This is just not true, or at least it's extremely disingenuous.

Container isolation relies on the Linux kernel. Other than seccomp-denied syscalls (which aren't a thing in k8s by default) any program in the container has full access to the kernel. The Linux kernel has massive attack surface, especially to root users.

VM isolation like Firecracker is much safer. The attack surface is considerably lower. For one thing, you can isolate the process in the guest just as well as you could outside, further limiting attack surface. But more importantly, an attacker either has to attack:

1. Firecracker

2. KVM

Both are very small codebases.

Firecracker is:

1. Written in Rust.

2. Sandboxed aggressively.

KVM has basically never had a public guest to host breakout. You can read about one here, https://googleprojectzero.blogspot.com/2021/06/an-epyc-escap...

So, to recap, we have "security boundary relies on a fully exposed Linux kernel" and "security boundary relies on hardened, tiny, security-driven programs".

It is not even close.

> There should be proper hardware-level isolation here, depending on the scenario. Most cloud companies can't afford that though, because they're not rolling out their own hardware.

Hence hardware building hypervisor support in.

link

chrisseaton 1535 days ago

Genuinely, would you be happy with just container isolation between you and other customers of your cloud provider?

Most people absolutely would not.

link

csmpltn 1535 days ago

> "Genuinely, would you be happy with just container isolation between you and other customers of your cloud provider? Most people absolutely would not."

But that's exactly how VPS hosting works today - you don't get your own private blade unless you're ready to pay premium prices and have the competence needed to run them yourself. The technicalities of how private resources in a VPS are isolated from each other will differ, but the concept remains the same nonetheless.

People bite the bullet, only to be subject to things like rowhammer [1], or other container escape scenarios [2].

The top comment in this thread reflects the proper way of dealing with this: containers or sandboxes are may not be treated as a secure boundary.

[1] https://www.usenix.org/conference/usenixsecurity16/technical...

[2] https://www.intezer.com/blog/research/how-we-escaped-docker-...

link

detaro 1534 days ago

No, VPS hosting is not usually container-based today once you leave the utter bargain-bin offers. The difference between VM isolation and container isolation is quite significant.

link

chrisseaton 1535 days ago

> But that's exactly how VPS hosting works today

No, VPS is isolation by virtualisation, not containerisation.

The clue is in the V in the name.

link

raesene9 1535 days ago

So I'll start by saying that security is always relative and what's ok for one environment won't be for another :)

The challenge with Linux containers as used by Docker/Containerd/CRI-O et al, is that containers run against a shared Linux kernel. The Linux kernel has a very large attack surface, so it's easier for attackers to find some way to bypass the restrictions it tries to enforce. If you look at this year there have been several Local Privilege Escalation issues in the Linux Kernel, some of which have allowed for container breakout.

If you compare this to a hardened hypervisor (e.g. Firecracker) there is a much smaller attack surface visible from inside the container. It obviously could have a breakout vuln. but there is a lower chance of that occurring.

link

michaelt 1535 days ago

Developers working with docker are almost always in the 'docker' group on their local machine, which is functionally equivalent to running everything as root.

link

staticassertion 1535 days ago

This doesn't matter if the attacker is in the container. It just means that if the attacker is outside of the container they have a trivial privesc to root on the host.

link