Hacker News new | ask | show | jobs
by tptacek 1870 days ago
The problem with this jails/zones stuff is that I don't know anyone who seriously trusts jails and zones for real multitenant workloads anyways. The dealbreaker problem remains a shared kernel attack surface between tenants. It's one thing to propose that Zones are better than namespaces (they probably are), but another thing to cross the threshold where the distinction is meaningful in practice.
3 comments

At Joyent, we deployed public-facing multitenant workloads based on zones (and before that, jails) for many years. We seriously trusted it -- and had serious customers who seriously depended on it. So, now you know someone!
To be fair, y'all had some serious vulnerabilities, including zone escapes and arbitrary kernel memory reads, discovered by @benmmurphy.
Yes, though I would like to believe that Ben's responsible disclosure coupled with our addressing those vulns (and auditing ourselves for similar) reflect exactly that seriousness around multitenant security. And for whatever it's worth, one of those vulnerabilities -- which was a bug in my code! -- very much informed by own thinking about the inherent unsafety of C, underscoring the appeal of Rust. So I am grateful in several dimensions!
If you have a kernel implemented in Rust, (1) you should shout that from the rooftops and (2) use whatever isolation mechanism you like on it.
They're starting with the bootloader and management engine. That's a tough enough ocean to boil.

Give them some time to get Rust above that.

Sadly Apple decided for a safe dialect of C for similar purposes e.g. iBoot, where they could have gone with Swift or Rust instead.

Very big ocean indeed.

To this, all I can say is that I spent from 2005-2014, and then from 2016-2020, doing nothing but security evaluations of products, probably about 60% of which were serverside multitenant SAAS systems of one form or another, and I don't remember ever evaluating (or overseeing the evaluation of) a system that relied on Jails or Zones. Lots of Docker! And, until a few years ago, multitenant Docker isolation was an infamous joke! I'm not sticking up for it!

You can look at the recent history of Linux kernel LPEs --- there has been sort of a renaissance because of mobile devices --- and count all the ways any shared-kernel multitenant system would have broken down. At the end of the day, it's not so much about predicting whether your system can get owned up (it can), so much as: "what do I need to do when there is a kernel LPE announced on my platform". If you're doing shared-kernel isolation, the right answer to that question is usually "fire drill". It's not a noodley thought-leadership kind of question; it's a simple, practical concern.

There were also tons of providers who trusted Linux containers for VPS hosting.
How'd that turn out?
I haven't heard any stories of people being hacked via container escape, but the whole VPS industry was so low-stakes that maybe customers didn't expect good isolation anyway.
And needless to say it became a billion dollar business, with a great product.
They were acquired for $170m.
I stand corrected. Still great product, business and team.
I'm sure they're great. No part of what I have to say about this has anything to do with how competent they are.
Security requirements (and awareness) have increased over the years, have they not?
They definitely have! And we had a (zones-based) public cloud through it all. On that note, Alex Wilson's description of working with Robert Mustacchi on mitigating Meltdown by adding KPTI to illumos[0] definitely merits a read!

[0] https://blog.cooperi.net/a-long-two-months

Also, tools for improving Docker for multi-tenant workloads exists, like gVisor. I don’t think equivalents exist for jails/zones really.
gVisor isn't a shared-kernel multitenant system; it's essentially kernel emulation. It's a much stronger design.
I mostly mean that it is intended to be a solution to run containers from multiple tenants on the same host. Though I do agree, being essentially a kernel in itself, it is a bit in a different wheelhouse. It still is a huge value add that you can implement something like that on top of Docker, imo.
You can run container workloads in "real" VMs too; for instance, check out Kata Containers. Containers are a way of packaging applications; confusingly, they happen to also have a reference standard runtime associated with them. But you don't have to use it.
Of course, and that's the value of the abstraction to me. Docker itself is obviously nothing to do with the Linux container technologies themselves that make up the equivalent functionality of FreeBSD jails, but I'm not aware of any equivalent abstraction that works around jails or zones even though it might be possible. So the way I see containers on Linux is not literally a composition of kernel features like cgroups or seccomp, but as an abstract thing that can be composed out of various primitives. And in practice, there's a number of different runtimes around it, including Docker clones like Podman, or tools that manage effectively chroots much closer to what you would do with jails.

That said, I could just be completely wrong, and there could be similar things that can be done using jails and zones. But when I looked around for similar art with FreeBSD jails, either with regards to Docker's style of packaging and distribution, or with regards to additional layers like gVisor, it didn't seem like a thing well-suited to that kind of composition. In comparison, jails, at least, seem kind of like more powerful chroots. To me this is a pretty big difference versus Linux "containers".

My mental model of Zones and Jails is that they are a cleaner, more convenient, less error-prone way of expressing a modern, minimally-privileged, locked down Docker runtime. You won't catch me arguing that Zones aren't better than Docker, but the u->k attack surface is untenable for multitenant workloads.
Similar to Xen?
Much weirder than Xen.
> The dealbreaker problem remains a shared kernel attack surface between tenants.

Also, now, extremely subtle and hard-to-mitigate timing attacks between tenants.

In fairness, that's an attack class that's very difficult to eradicate even with virtualization.