Hacker News new | ask | show | jobs
IncludeOS: C++ unikernel now free and open source (github.com)
136 points by AlfredBratterud 3856 days ago
4 comments

> A minimal bootable image, including bootloader, operating system components and a complete C++ standard library is currently 693K when optimized for size.

WOW. That's just nuts! I mean I know linux can be slim, but thats less than a meg!

Really very exciting stuff. I don't know if I should play JC3 or tinker with this.

Should look at systems of the past for inspiration on that. Remember that old systems couldn't waste MB on stuff. So, they got clever. Hansen's Solo had 4KB trusted kernel with whole OS being 21Kloc of readable Pascal with total image size of 110KB. That includes kernel, filesystem, I/O, editor, and two compilers. Result was type-safe, mostly memory-safe, and concurrency-safe (deadlock & race condition free specifically). People could copy and update old approaches like that for OS side of things then they don't even need a unikernel. ;)

Now, wanting strong separation mechanisms and so on would complicate things. The old security kernels, like STOP OS and GEMSOS, used MMU's, segments, MAC policies, covert channel suppression... you name it. STOP was still only 21,000 Kloc in total:

http://www.cse.psu.edu/~trj1/cse544-s10/slides/cse544-lec9-1...

So, monolithic or micro-kernel route, you can get things way smaller at the foundation than what we use today. Add basic TCP/IP stack, Ethernet driver, and runtime it should still be way smaller than 693K unless that's tons of error or security handling code to increase robustness.

I'm pretty sure you can find MIPS Linux images with Busybox et. all. that weigh less than that. A lot of routers run Linux on 2M eproms including user space.
I have a port of Alan Cox's Fuzix for the MSP430 where the entire bootable ROM image is 27kB.

(That's actually quite big for a tiny-computer OS, although it's the smallest Unix I know of.)

For those who don't know who Alan Cox is, he was the Linux 2.2 branch maintainer, is still heavily involved in Linux, and also does lots of other neat shit.
I seem to recall the QNX kernel is pretty slim as well.
I used to have a 1.44Mb floppy that booted QNX to a desktop environment. Of course, that's not so unusual historically: both Amiga and GEM had single-floppy desktop enviroments (or did the Amiga require two to get to Workbench?)
That's just the microkernel, though --- it didn't contain any functionality other than sending messages (and a few other things, microkernels very).

Fuzix' 27kB contains the scheduler, drivers, tty, file system and Posix subsystem!

Fuzix definitely was end to end. On the other hand, the QNX kernel is a more capable foundation to build on.
EmCraft shows a 594kB linux image for Cortex M4

http://www.emcraft.com/stm32f429discovery/what-is-minimal-fo...

You won't find anything remotely recent squeezing into a mere 2MB. You can't really get Linux that slim anymore without cutting out major things like the network stack. 4MB is still (just barely) enough ROM for a real router, if you're judicious about what features to include, and don't mind the lack of a browser-based configuration interface.
It still needs a 'real' OS (e.g. Linux) running as the Hypervisor.

The entire concept appears to me like a fancy ELF, cause we couldn't solve the problem of packaging software in a portable way before, apparently.

Not strictly, no. While currently they only support the "hardware" in KVM/QEMU/VirtualBox, there's nothing stopping anyone implementing drivers for whatever hardware they want to run on, even bare metal.

This probably means integrating whatever Ethernet adapter you have, and some way of sending/receiving stdio, probably over serial or something.

Sideline: I'm particularly intrigued by the idea of implementing drivers for AWS EC2, so you can run this as your AMI.

> Not strictly, no. While currently they only support the "hardware" in KVM/QEMU/VirtualBox, there's nothing stopping anyone implementing drivers for whatever hardware they want to run on, even bare metal.

Exokernels (https://en.wikipedia.org/wiki/Exokernel) essentially implement this. Hardware has been adding more support for virtualization though, so I imagine unikernels are better able to take advantage of things like virtualized page tables, virtualized io-mmu, etc...

I'd be curious to see benchmarks for something generally IO bound like PostgreSQL or HyperTable. IO scheduling isn't trivial, so it might give a good idea of what some of the trade-offs might be.

> I'm particularly intrigued by the idea of implementing drivers for AWS EC2, so you can run this as your AMI.

Yup. Kind of like a skimpy version of OSv.

Yeah, how does this compare to OSv?
The kernel is way, way smaller, and has basically no OS services. OSv is more like BSD-lite.
> It still needs a 'real' OS (e.g. Linux) running as the Hypervisor.

It runs against hardware virtualization... I'm not sure I understand why it needs a "real" OS underneath. At least in theory it should be able to run on bare metal as long as you include logic for whatever device drivers you might need.

Microkernel and RTOS work shows anything it needs can be about as tiny as itself. Tinier, actually, if it's a networked service with a single app running. It already has networking, needs a driver, a few special-purpose components... not much left to add.
This isn't Linux.
I think that was just a comparison, because Linux can be very small.
Oh, wow, this is exactly what I've been working on with a project called "fleet", though they've gotten a lot further than I did. I guess I'll have to consider ditching my code and using theirs instead.
So anyone know how small the L4 kernel is compared to this?
5-10 times smaller for regular ones with it closer to 50 for separation kernels. That's not a fair comparison, though, as you need extra components to support the unikernel or just components on the kernel. A more fair comparison might be L4Re, a stripped GenodeOS, OKL4 w/ necessary software, or NOVA with necessary software.

L4Re was the first attempt I believe at an environment for apps on L4:

https://os.inf.tu-dresden.de/L4Re/overview.html

Any chance it will support Xen, too, in the future? Also, unikernels should preferably be written in memory safe languages like Rust.
> unikernels should preferably be written in memory safe languages like Rust

There is a rust project. How is that relevant to this?

When using a unikernel design there is no isolation between a single node software components like an OS can provide. All the code share the same memory space.

Isolation can be useful. For example the Postfix email server (running on a Unix, not unikernel) is decomposed into several processes with different privileges. That allows running the most sensitive parts with limited access rights, to protect against attacks, while still running those parts on the same server for efficiency. This process isolation is typically not provided with unikernels.

A lot of unikernels compensate for this by providing language level protection, by using safe high level language (see MirageOS using OCaml, other based on Erlang, haskell or rust). Then it's not process isolation but the language and its implementation that guarantee that provides protection / isolation between components. My understanding is that is what GP refers too.

A unikernel based on C or C++ will have no process isolation, and no language level isolation either. So sensitive components would have to be split into different nodes, isolated by using either separate VMs or machines. That's doable, but adds complexity to the orchestration and possibly some overhead.

> That allows running the most sensitive parts with limited access rights, to protect against attacks, while still running those parts on the same server for efficiency. This process isolation is typically not provided with unikernels.

Unikernels also don't have notions of access rights. To make that model work, more than memory safety, you need declarative ways of asserting constraints, which... C++ arguably can pull off pretty well.

> A unikernel based on C or C++ will have no process isolation, and no language level isolation either.

C++ has lots of support for language level isolation. Yes, due to C compatibility, if you diddle with memory you can find ways to violate those contracts, but it is entirely possible, particularly in a unikernel context, to prove that you aren't doing that.

> That's doable, but adds complexity to the orchestration and possibly some overhead.

That'd add way more complexity and likely bugs to the code than just keeping the codebase clean.

> C++ has lots of support for language level isolation. Yes, due to C compatibility, if you diddle with memory you can find ways to violate those contracts,

No, it's not. The memory unsafety in C++ is fundamental to the language (think iterator invalidation). It has nothing to do with C compatibility and comes up all the time in well-written C++ code in practice.

Possible, but not necessarily easy; what tools do you use to sweep C++ code for 'unsafe' constructs?
I'm interested in the answer to that, too. A member of the Chrome team asked what static analysis or advanced verification tools I thought they could use in a significant C++ project. Digging around, I think I just found one, limited one plus two ways of doing Design-by-Contract (asserts & OOP). That was it. Not inspiring lol.

Now, there has been work on type-safe or memory-safe version of C++. They're non-standard. They also get smashed when a memory error occurs and that will happen. So, suggesting to rely on language-based isolation in C++ is a more a joke than something worth trying.

Good example of work on C++ safety:

https://www.cis.upenn.edu/~eir/papers/2013/ironclad/paper.pd...

Around memory safety, it would be simply be dereferencing any raw pointer. You'd whitelist a set of smart pointer classes that manage that and you can pretty much just use the clang parser to catch it.
And erlang projects, and projects for every other programming language you could possibly want. Just use whatever you want!