Hacker News new | ask | show | jobs
by jrv 3659 days ago
It seems this article gleefully admits many of the downsides of unikernels mentioned in https://www.joyent.com/blog/unikernels-are-unfit-for-product..., while being very brief and naive about the upsides (mainly the very contested security argument).

I admittedly haven't studied the whole unikernel space yet, but intuitively they do seem unfit for production unless we spend a decade rebuilding tooling (debuggers, process diagnostics tools, etc.). And even then, other downsides apply, as laid out in the Joyent article.

Happy to change my mind over time if it proves to be the other way around, but for now I'm very skeptical.

2 comments

I wouldn't say that unikernels were entirely undebuggable. I spent a few hours hacking and came up with a proof of concept dom0 profiler, and learned some debugging benefits: one symbol table for the entire binary, one place to turn on frame pointers for everything, etc.

http://www.brendangregg.com/blog/2016-01-27/unikernel-profil...

That requires having access to dom0, which is not available on Amazon EC2 (for good reason). Running in UNIX binary mode in a container on bare metal seems like the way to go here. That does not have the overhead of hardware virtualization. That means no internal memory fragmentation from partitioning RAM, the ability to avoid duplication of cache and any hypervisor upcalls become normal system calls. At that point, the unikernel is just a normal UNIX process that can be debugged with conventional tools.
We already get hypervisor statistics from dom0, via cloud watch. Could that include on-demand profiling as well? I don't see why not. And that's not the only way to solve this.

So work needs to be done to make unikernels profileable & debuggable. I wouldn't claim that this was impossible.

I do not dispute that it is possible to profile/debug unikernels on future cloud infrastructure. However, I am skeptical that unikernels offer any benefit to merit the work of enabling that when the whole system is considered.

System calls might be additional overhead when there is a hypervisor, but hypervisors are unnecessary when we have containers. You stand to eliminate much more overhead from eliminating the hypervisor than you stand from eliminating the syscalls. Some of that overhead is internal fragmentation from memory partitioning, duplication of driver code, potential double caching, etcetera.

The industry is in the early stages of a transition from hardware virtualization to containers because containers are a better abstraction than hardware virtualization. Joyent offers Illumos zones, Swiscomm offers docker containers with flocker (full disclosure: my employer is the author of flocker), Microsoft has deployed drawbridge on their Azure cloud, etcetera. We will only see more of this in the future.

Once the transition is complete, I see no advantage to unikernels. You could use them in UNIX binary mode, but that makes them little more than a standard process on a traditional system. That is a very different role than the one that their creators intended for them.

Price/performance always wins.

Can an application unikernel on a hypervisor (which is really a lightweight OS that nowadays supports many pass-through features) beat performance of an application on a regular OS? (I didn't mention containers, since they should ultimately be irrelevant to hot path performance). So can it? With a lot of work, I bet they can.

So who has has the better price/performance? That's going to depend on how much engineering work it is to adopt, fix, and use unikernels, when they are competing with an established ecosystem around Linux and containers. And that may be where unikernels actually loses on price/performance, where price includes total cost of ownership. We'll see!

I suspect that the performance of a unikernel on a hypervisor vs an application on a POSIX system is somewhat analogous to the performance of hardware RAID vs ZFS. The performance of the abstraction used in the former is inherently worse than the performance of the abstraction used in the latter. The example of memory partitioning gives the latter a price/performance advantage and it is not the only one. I would be happy to be proven wrong though.
Hypervisors offer decent security and performance guarantees, which means they are good for sharing resources among potentially hostile customers. Their simple resource semantics and small ABI makes for a fairly secure abstraction.
Kernels do as well. Both have had security exploits that lead to privilege escalation. As containerization matures, I expect the security of a container host to become similar to that of a hypervisor. They are essentially doing the same thing. The only place on which they differ is the kind of abstraction that they use.

LPARs/LDOMs are a much more secure abstraction for "sharing resources among potentially hostile customers". Those physically partition at the hardware. LPARs are used on the IBM mainframes and are "EAL5 Certified". LDOMs are the SPARC equivalent, but I do not know their EAL. Both traditional kernels and various hypervisors are EAL 4 (some are called EAL4+), which is not as secure.

There is nothing stopping people from creating a unikernel for a dynamic language that also includes the development tools.

A Lisp Machine on Xen would be one model.

I feel like Erlang-based unikernels are an extremely compelling alternative to traditional UNIX deployments. Immutable systems with safe hot swap and excellent debugging tools like `observer` and `debugger`.
Erlang on Xen is already that way. You can use the full Erlang profiling/tracing/debugging/observing toolkit on an EoX node.
That's great, but with processes (containerized or not), I can use the full variety of standard diagnostic tools built over many decades. Of course you can start building similar tools for unikernels, but it's gonna take you a long time before you get to a similar state.
To counter the OP, you'd have to use an existing profiler.
See the discussion about that post over at https://news.ycombinator.com/item?id=10953766
Yep, I did. It still doesn't really change my view, especially on the front of operatability/debuggability. If you have ever operated really large services, you know how big of a problem the lack of that is.