Hacker News new | ask | show | jobs
by haberman 3659 days ago
Tell me if I'm missing something, but the premise of Unikernels seems to be that a ring-0 x86 hardware environment is the perfect fit for a universal container/host interface.

Or to put it more charitably, since cloud compute services are based around booting VM images based on this model, we'll just go with it instead of trying to use an abstraction that is actually designed for this.

Correct me if I'm wrong, but it seems to me that the first thing any unikernel is going to do when it boots is switch the (virtualized) CPU out of x86 Real Mode (which all x86 machines boot into for legacy reasons, but virtually no one has needed since circa 1995) into protected mode.

Is it just me or does this seem a little bit crazy?

7 comments

The premise of unikernels is that:

1. The job of an OS is to ensure that multiple programs can run on a single box without interfering with each other.

2. The job of a hypervisor is to ensure that multiple OSes can run on a physical box without interfering with each other.

3. In many cloud deployments, a single VM instance only runs a single user-defined program, which is programmed to a higher-level runtime than the OS (eg. Node.js, JVM, Rails/Django, SQL).

4. Why do we need #1 then?

IMHO, the real interesting stuff happens when you start re-implementing the APIs that we actually program to, without the OS. For example, what if:

1. You could take any command-line ELF executable and build an AMI out of it. This AMI would have an HTTP interface that only accepted connections from certain security groups. It would take in the command-line args via query params, and let you construct a virtual filesystem containing only the files you operate on via request body. Imagine say a compile server that runs Clang on user-defined code and serves the executable back, to be run on its own VM. And the crucial part is - there is no persistent storage on the box, nor any code that would be worth attacking. If there's a bug in the executable and an attacker pwns the box, the worst he can do is corrupt the request. There is no shell. There is no filesystem. There is no TCP stack to make outgoing connections with.

2. You could re-implement Node.js for stateless webservers. Again, you'd have no filesystem; once the initial program starts, it's guaranteed to never touch disk, since it has no disk access. Node does its own scheduling, and this way Node's scheduler doesn't need to fight the OS scheduler. You could store preformatted HTTP packets or response fragments in read-only memory pages and send them out directly via RDMA.

3. You could do a database or search engine that bypasses the filesystem entirely, instead writing directly to raw disk blocks. It can choose these disk blocks based on locality, since it knows the particular index structure and access pattern for the data, and doesn't have to fight the OS's attempts to hide the disk blocks under a file abstraction.

The point of unikernels is to take away stuff - it's not about which mode the CPU boots into, it's about removing all the code that is on a typical cloud computing image but has nothing to do with the job the instance is actually doing. All of this - shell, filesystem, DNS resolvers, etc. - is attack surface for a potential hacker, and it's often overhead when processing.

I tried to read up on this, but I'm not too familiar with the terminology. Are unikernels the formal name for the idea of running your application 'bare metal'?

In the parent post, does AMI mean Amazon Machine Image, or some Application M____ Interface?

Yeah, "running your application bare metal" is a useful first approximation. Technically, they consist of the toolchain and libraries necessary to replace OS functionality with userspace library calls, which then run on the bare metal. (Or technically, in any practical deployment they would run on a hypervisor, which presents an interface that looks like bare metal.) MirageOS, one of the first unikernel designs, works by statically analyzing an Ocaml program to identify OS calls and then only linking in the libraries required to support those particular calls, all of which have been re-implemented from the ground up for security.

Right now, much of the research on unikernels focuses on implementing a POSIX API. In other words, it replaces libc so that instead of eg. write() making a syscall into a kernel, write() inlines the code that the kernel would've run and talks directly to the hardware.

IMHO, the real wins for unikernels come when they start implementing higher-level interfaces, eg. Node or Rails or Django or HTTP or SQL or the JVM. Many programs are already written to these frameworks, with no knowledge of (or in some cases, access to) the underlying POSIX APIs, and the frameworks themselves often re-implement a large portion of the OS to create better domain-specific abstractions. Node or Python's asyncio, for example, implement their own schedulers that each run inside a single OS thread. Databases work in terms of pages, built on top of a filesystem; they effectively try to recreate the abstraction of a block device on top of a stream on top of a real block device. Websites often have large quantities of text that are sent back with every request (think of page layout in a templating engine, or JS bundles for a SPA). This data is usually copied and concatenated multiple times within a framework, while a bare-metal-aware web framework would store it in a buffer somewhere and write it out directly to the network card.

And yes, I meant Amazon Machine Image. Doesn't have to be Amazon, but I'm focused on the pragmatics of how you might deploy a real unikernel to solve problems, and wanted to make the point that you're going to be loading it into Xen or some other cloud hypervisor at the end.

Some (relational) databases go to great lengths to recreate on top of a file system an interface that looks more like the block level storage that's underlying.

A unikernel can cut out the middle man here.

We've gone full circle. Originally there was shared hosting whiched hosted your app in ring 3 with other users on the same physical machine running which ran in ring 0. Then we got fancy virtualization hardware where the hypervisor ran in ring -1, your VM ran in ring 0 and your apps ran in ring 3. But that's a lot of indirection so micro kernels move your app into ring 0. So now we're basically back at shared hosting where your app runs one level higher than the host OS. Except now your app also bundles a partial OS and has weak debugging tools. It does have better isolation though then shared hosting so that's a plus.

But containers are basically the same thing but with better debug support and a more familiar OS environment. Problem is containers need to be deployed on metal to be effective, not VMs. Unfortunately not many providers do this yet.

So yeah it is all kinda crazy.

> Unfortunately not many providers do this yet.

Samsung just acquired Joyent, which provides multi-tenant container hosting on bare metal via Illumos and LX-branded zones. So to me, the acquisition further validates that approach.

Based on this blog post, unikernels also need a special hypervisor (presumably with femto-sized VMs with second-granularity billing) so you might as well run containers.
> Tell me if I'm missing something, but the premise of Unikernels seems to be that a ring-0 x86 hardware environment is the perfect fit for a universal container/host interface. > Or to put it more charitably, since cloud compute services are based around booting VM images based on this model, we'll just go with it instead of trying to use an abstraction that is actually designed for this.

That is my understanding of part of the premise of unikernels. Another is security from having less code, although nothing stops you from having less code with Linux. LEDE/OpenWRT are Linux distributions that are often smaller than the sizes that are advertised for unikernels.

I consider containers using syscalls on a kernel that operates on bare metal to be a better abstraction.

> Correct me if I'm wrong, but it seems to me that the first thing any unikernel is going to do when it boots is switch the (virtualized) CPU out of x86 Real Mode (which all x86 machines boot into for legacy reasons, but virtually no one has needed since circa 1995) into protected mode.

That is only on x86/amd64 systems. It is different on other architectures.

> Is it just me or does this seem a little bit crazy?

The more I learn about unikernels, the more skeptical I become of them.

Xen offers a couple of ways to load and start a kernel and you'd want to start a kernel in long or protected mode. Ideally, you also use the hypervisors virtual device interfaces instead of scanning for emulated devices. I don't know if the sane startup protocol is final but here are some pieces of documentation: http://xenbits.xen.org/docs/4.7-testing/misc/hvmlite.html http://xenbits.xen.org/docs/4.7-testing/misc/pvh.html

The nice thing about ring-0 is that on modern hardware with SR-IOV, a VM can be associated with devices and the multiplexing that had to be done within a kernel or hypervisor can now be done in hardware.

Yeah. The right thing is hosting providers offering Java application servers or equivalent. Unfortunately there are political and commercial reasons that's not happening.

Still, there are worse container/host interfaces. It could be the full suite of POSIX system calls.

> Still, there are worse container/host interfaces. It could be the full suite of POSIX system calls.

Why would a Java application server provided by the host be better than the full suite of POSIX system calls?

My sense is the set of "system calls" (i.e. `native` functions) is smaller and more rigorously specified.
I think the premise unikernel is that of a library or app as an OS.

I'm not following why you think turning on protected mode is crazy. Can you elaborate?

Yeah, but if you use containers you have to suffer the indignity of the creat() system call. These seem like the smallest possible objections.
You are assuming that our only options are "x86 hardware ring-0" or "Linux system call interface." Both are crufty in their own ways, but more importantly, neither of these was designed to be this. The right answer might be an interface that is designed with containerization in mind.
Can you explain why the creat() sys call lack dignity and how that relates to containers?
That was just a joke. I think haberman is spot on that both x86 and Unix are crufty in their own ways and thus cruftiness isn't a good metric to judge these abstractions on.
Oh hah, that's funny now I feel silly :)