Hacker News new | ask | show | jobs
by jsiepkes 1571 days ago
> My point is that if it were designed from the ground up with the hard security boundary in mind, would we have ended up with containers in the first place?

It might have looked like FreeBSD jails or Illumos / Solaris Zones. Both of which are containers designed as a security boundary from the start.

2 comments

I'm here to push back on the fabled security powers of ground-up security-focused shared-kernel isolation. People love to bring up Zones and Jails in these conversations, presumably since both are much more coherent designs than Linux namespaces, MAC, BPF and cgroups, which are now comparably (if not more) featureful, but shambolic and hard to reason about. But none of these systems are sufficient for multitenant isolation. It would not be OK to rely on Zones for a major multitenant compute workload.
> But none of these systems are sufficient for multitenant isolation. It would not be OK to rely on Zones for a major multitenant compute workload.

You can definitely run hostile workloads securely in zones next to each other. Joyent ran a public cloud on zones and there are still smaller cloud providers who do.

In the Sun Solaris days zones were even certified for a bunch of high profile security certifications (if you care about such things).

And Joyent had problems doing that:

https://news.ycombinator.com/item?id=27078349

There's nothing you can do to "certify" zones to mitigate this. The problem is that zone cotenants share a kernel. You have to trust that the kernel attack surface is free of LPEs, and no reasonable person can trust that.

I don't see how bugs of zone escapes and such are necessarily proof of the concept not working.

Chrome also has had its fair share of sandbox escapes and zero-click remote code execution exploits. Does that mean you can't have a browser? I mean by those standards if even Google can't get it right us "mere mortal developers" might as well quit all together.

> The problem is that zone cotenants share a kernel.

Even with a "hardware" VM they share a kernel (it's just called a hypervisor). And while they share that kernel to a lesser extent there are also VM escapes. The VMWare and KVM security advisories are a testimony to that.

The Chrome sandbox would also be problematic for these workloads, for similar reasons! The point of isolated kernels is to foreclose on whole large classes of vulnerabilities. The problem of shared-kernel isolation is that you opt into them.

In the status quo ante of Firecracker, there were colorable arguments that hypervisors had comparably large attack surfaces to containers and jails and zones. But that's mostly out the window now: you can write a mostly memory-safe hypervisor and give it a tiny attack surface by providing only minimal support for virtio devices --- the big challenge with legacy hypervisor stacks is that they were designed to support things like desktop Windows, rather than being scoped down to serverside Linux.

Or HP-UX vaults grown out of Tru64.