Hacker News new | ask | show | jobs
by ryao 3659 days ago
I do not dispute that it is possible to profile/debug unikernels on future cloud infrastructure. However, I am skeptical that unikernels offer any benefit to merit the work of enabling that when the whole system is considered.

System calls might be additional overhead when there is a hypervisor, but hypervisors are unnecessary when we have containers. You stand to eliminate much more overhead from eliminating the hypervisor than you stand from eliminating the syscalls. Some of that overhead is internal fragmentation from memory partitioning, duplication of driver code, potential double caching, etcetera.

The industry is in the early stages of a transition from hardware virtualization to containers because containers are a better abstraction than hardware virtualization. Joyent offers Illumos zones, Swiscomm offers docker containers with flocker (full disclosure: my employer is the author of flocker), Microsoft has deployed drawbridge on their Azure cloud, etcetera. We will only see more of this in the future.

Once the transition is complete, I see no advantage to unikernels. You could use them in UNIX binary mode, but that makes them little more than a standard process on a traditional system. That is a very different role than the one that their creators intended for them.

2 comments

Price/performance always wins.

Can an application unikernel on a hypervisor (which is really a lightweight OS that nowadays supports many pass-through features) beat performance of an application on a regular OS? (I didn't mention containers, since they should ultimately be irrelevant to hot path performance). So can it? With a lot of work, I bet they can.

So who has has the better price/performance? That's going to depend on how much engineering work it is to adopt, fix, and use unikernels, when they are competing with an established ecosystem around Linux and containers. And that may be where unikernels actually loses on price/performance, where price includes total cost of ownership. We'll see!

I suspect that the performance of a unikernel on a hypervisor vs an application on a POSIX system is somewhat analogous to the performance of hardware RAID vs ZFS. The performance of the abstraction used in the former is inherently worse than the performance of the abstraction used in the latter. The example of memory partitioning gives the latter a price/performance advantage and it is not the only one. I would be happy to be proven wrong though.
Strawman. Lets talk about Xen and Unikernels. So you said:

> You stand to eliminate much more overhead from eliminating the hypervisor than you stand from eliminating the syscalls.

So my hypervisor application talks directly to devices, thanks to pass-through. What's that about syscalls again?

It is not just syscalls. You either have inefficiency from double caching or inefficiency from a lack of a global page replacement algorithm. You also have internal fragmentation from memory partitioning, which prevents you from running as many applications and/or reduces memory available for cache. I consider these to be fundamental disadvantages.
> You either have inefficiency from double caching

No double caching: most of our applications have an in-memory working set, and no disk state. Some do (eg, Cassandra databases).

> from a lack of a global page replacement algorithm

Oh, so it's more efficient to be running where paging (aka swapping) is allowed? Sure, for memory footprint, but for runtime performance you're banking on it reaching a state where paging is minimal. The amount of memory saved is depending on the working set, maybe a lot, maybe a little. One downside is you're paying a small CPU tax to manage this (maintaining kswapd lists, and scanning them).

I think this would sometimes be a benefit, and sometimes not. And if not, is there anything stopping a Unikernel -- which must manage its own memory anyway -- from implementing its own pager?

The inefficiency isn't technical resources, but human resources: having Unikernel engineers reinvent what modern kernels already do.

> You also have internal fragmentation from memory partitioning, which prevents you from running as many applications and/or reduces memory available for cache.

Again, usually no file system cache in use. And most apps are started with a fixed heap size that consumes all of memory. There's no left-over/wasted memory that could be used by other apps.

If you want to page out cold memory to make room, uh, sure, but see previous comment. I bet that sometimes works, and sometimes doesn't.

> No double caching: most of our applications have an in-memory working set, and no disk state. Some do (eg, Cassandra databases).

I see no disadvantage for unikernel on hypervisor setups versus applications on a container host setups in applications where there is no disk state. However, I see no advantage either. The techniques used to talk to hardware directly work in userland too. netmap is a fantastic example of this.

I had expected unikernels on hypervisors to have a disadvantage against a container on a traditional kernel, but after reading your remarks, I think that the two ought to perform identically (at least where there is no file system IO), with neither being theoretically better. However, the world is adopting containers in traditional kernels and unless a unikernel on a hypervisor can be better, I do not see much value in devoting resources to unikernels too.

> Oh, so it's more efficient to be running where paging (aka swapping) is allowed? Sure, for memory footprint, but for runtime performance you're banking on it reaching a state where paging is minimal. The amount of memory saved is depending on the working set, maybe a lot, maybe a little. One downside is you're paying a small CPU tax to manage this (maintaining kswapd lists, and scanning them).

I was referencing cache efficiency when I talked about page replacement algorithms rather than paging to disk. Imagine a global ARC algorithm in a traditional system versus each unikernel having its own. The global hit rate would be better with a global algorithm than it would be with a local algorithm in each unikernel.

Even if your application does its own cache, the principle of a global algorithm being best ought to apply to filesystem metadata.

> Again, usually no file system cache in use. And most apps are started with a fixed heap size that consumes all of memory. There's no left-over/wasted memory that could be used by other apps.

This is not the sort of application that I had in mind. I am still skeptical that unikernels are better, but I agree that they are not worse here. In this case, it seems to me that they are (theoretically) just a different way of doing things and are not better or worse.

Hypervisors offer decent security and performance guarantees, which means they are good for sharing resources among potentially hostile customers. Their simple resource semantics and small ABI makes for a fairly secure abstraction.
Kernels do as well. Both have had security exploits that lead to privilege escalation. As containerization matures, I expect the security of a container host to become similar to that of a hypervisor. They are essentially doing the same thing. The only place on which they differ is the kind of abstraction that they use.

LPARs/LDOMs are a much more secure abstraction for "sharing resources among potentially hostile customers". Those physically partition at the hardware. LPARs are used on the IBM mainframes and are "EAL5 Certified". LDOMs are the SPARC equivalent, but I do not know their EAL. Both traditional kernels and various hypervisors are EAL 4 (some are called EAL4+), which is not as secure.

I don't think kernels are inherently less secure than hypervisors, but as they stand, current hypervisor implementations have a better security track record than kernels. The basic point that I am trying to make is that both hypervisors and kernels are just pieces of software meant for partitioning and sharing hardware. Software that has simpler and smaller interfaces also has a lower probability of having bugs that lead to vulnerabilities. I agree that that there are better hardware partitioning implementations out there but unfortunately they are not so popular. I am looking forward to having formally verified kernels like seL4 become more popular.
Kernels usually provide quite a lot of abstraction in addition to secure partitioning and sharing. And that's arguably wrong: providing abstractions is complicated (thus inherently less secure), and one size does not fit all.

In a unikernel setup abstractions can live much more comfortably in libraries.

IBM terminology might be confusing me - but looking at published security targets it appears LPAR's themselves have only ever been evaluated at EAL4 with flaw remediation (ALC 2) and PR/SM being evaluated at EAL 5 but neither to any specific protection profile. This means that IBM created their own evaluations and gave themselves a "certification".

Protection profile less CC evaluations are worthless in the eyes of most governments and CC schemes, but kudos to IBM product management and marketing for creating competitive FUD.

As of a year ago LDOM's (Oracle VM for SPARC) hasn't had a CC evaluation and I'm not seeing anything currently in evaluation. Solaris Zones have been evaluated under the Solaris OSPP EAL4 + extensions evaluation.

The biggest reason that virtualization technologies haven't had a CC evaluation with a protection profile is that no US NIAP approved protection profile existed and the draft ones that were circulated were crap.

Assurance levels (EAL) are deprecated for newest NIAP protection profiles as the higher assurance levels (EAL4) were cost and time prohibited for vendors to complete before the product was outdated. Many people wrongly think common criteria is a security evaluation (free of bugs) - it's not - it's a security architecture evaluation (is the documented behavior working correctly).

There is a schism in CC - everything is changing - anything we know today is wrong and will change.

TL;DR: Common Criteria is a joke and doesn't actual mean what you think it does.