Hacker News new | ask | show | jobs
by nostrademons 3659 days ago
The premise of unikernels is that:

1. The job of an OS is to ensure that multiple programs can run on a single box without interfering with each other.

2. The job of a hypervisor is to ensure that multiple OSes can run on a physical box without interfering with each other.

3. In many cloud deployments, a single VM instance only runs a single user-defined program, which is programmed to a higher-level runtime than the OS (eg. Node.js, JVM, Rails/Django, SQL).

4. Why do we need #1 then?

IMHO, the real interesting stuff happens when you start re-implementing the APIs that we actually program to, without the OS. For example, what if:

1. You could take any command-line ELF executable and build an AMI out of it. This AMI would have an HTTP interface that only accepted connections from certain security groups. It would take in the command-line args via query params, and let you construct a virtual filesystem containing only the files you operate on via request body. Imagine say a compile server that runs Clang on user-defined code and serves the executable back, to be run on its own VM. And the crucial part is - there is no persistent storage on the box, nor any code that would be worth attacking. If there's a bug in the executable and an attacker pwns the box, the worst he can do is corrupt the request. There is no shell. There is no filesystem. There is no TCP stack to make outgoing connections with.

2. You could re-implement Node.js for stateless webservers. Again, you'd have no filesystem; once the initial program starts, it's guaranteed to never touch disk, since it has no disk access. Node does its own scheduling, and this way Node's scheduler doesn't need to fight the OS scheduler. You could store preformatted HTTP packets or response fragments in read-only memory pages and send them out directly via RDMA.

3. You could do a database or search engine that bypasses the filesystem entirely, instead writing directly to raw disk blocks. It can choose these disk blocks based on locality, since it knows the particular index structure and access pattern for the data, and doesn't have to fight the OS's attempts to hide the disk blocks under a file abstraction.

The point of unikernels is to take away stuff - it's not about which mode the CPU boots into, it's about removing all the code that is on a typical cloud computing image but has nothing to do with the job the instance is actually doing. All of this - shell, filesystem, DNS resolvers, etc. - is attack surface for a potential hacker, and it's often overhead when processing.

1 comments

I tried to read up on this, but I'm not too familiar with the terminology. Are unikernels the formal name for the idea of running your application 'bare metal'?

In the parent post, does AMI mean Amazon Machine Image, or some Application M____ Interface?

Yeah, "running your application bare metal" is a useful first approximation. Technically, they consist of the toolchain and libraries necessary to replace OS functionality with userspace library calls, which then run on the bare metal. (Or technically, in any practical deployment they would run on a hypervisor, which presents an interface that looks like bare metal.) MirageOS, one of the first unikernel designs, works by statically analyzing an Ocaml program to identify OS calls and then only linking in the libraries required to support those particular calls, all of which have been re-implemented from the ground up for security.

Right now, much of the research on unikernels focuses on implementing a POSIX API. In other words, it replaces libc so that instead of eg. write() making a syscall into a kernel, write() inlines the code that the kernel would've run and talks directly to the hardware.

IMHO, the real wins for unikernels come when they start implementing higher-level interfaces, eg. Node or Rails or Django or HTTP or SQL or the JVM. Many programs are already written to these frameworks, with no knowledge of (or in some cases, access to) the underlying POSIX APIs, and the frameworks themselves often re-implement a large portion of the OS to create better domain-specific abstractions. Node or Python's asyncio, for example, implement their own schedulers that each run inside a single OS thread. Databases work in terms of pages, built on top of a filesystem; they effectively try to recreate the abstraction of a block device on top of a stream on top of a real block device. Websites often have large quantities of text that are sent back with every request (think of page layout in a templating engine, or JS bundles for a SPA). This data is usually copied and concatenated multiple times within a framework, while a bare-metal-aware web framework would store it in a buffer somewhere and write it out directly to the network card.

And yes, I meant Amazon Machine Image. Doesn't have to be Amazon, but I'm focused on the pragmatics of how you might deploy a real unikernel to solve problems, and wanted to make the point that you're going to be loading it into Xen or some other cloud hypervisor at the end.

Some (relational) databases go to great lengths to recreate on top of a file system an interface that looks more like the block level storage that's underlying.

A unikernel can cut out the middle man here.