Hacker News new | ask | show | jobs
Demikernel: A library OS for kernel-bypass devices, now with Rust TCP/IP stack (github.com)
35 points by drkp 2437 days ago
3 comments

There is a huge amount of work around this, with a huge number of names for what is always pretty similar in conception: parakernels, exokernels, isokernels, now demikernels.

The common thread is that allowing the OS to interpose itself in the data path makes it impossible to operate devices at full speed, so applications need to bypass the OS. At the same time, you need some control over hardware access and permissions, and some degree of hardware abstraction. How to draw the line between application performance, on one side, and OS abstraction and hardware sharing, on the other, is endlessly negotiated.

Those of us actually operating hardware at maximum rates, today, often use proprietary vendor libraries to bypass the kernel. These tend to have an OS module to map hardware resources into user process space, and a library to operate hardware resources without adding overhead, typically involving a ring buffer and the spin-loop polling on isolated cores that we were taught in school indicated primitive system design. The result is that our applications have a hundred or six lines of custom code for each vendor's gadget, that has to be added to as new vendors enter and old ones retire.

eBPF access to devices is one interesting wrinkle on this, holding out a hope of mainstream portability without compromising performance, running user code directly on target hardware.

http://irenezhang.net/papers/demikernel-hotos19.pdf

Excerpt:

"Researchers have long predicted the demise of the operating system [21, 26, 41]. As datacenter servers increasingly incorporate I/O devices that let applications bypass the OS kernel (e.g., RDMA [12] and DPDK [15] network devices or SPDK storage devices), this prediction may finally come true. While kernel-bypass devices do eliminate the OS kernel from the I/O path, they do not handle the kernel’s most important job: offering higher-level abstractions. This paper argues for a new high-level, device-agnostic I/O abstraction for kernel-bypass devices. We propose the Demikernel, a new library OS architecture for kernel-bypass devices."

That's the WHY of the Demikernel...

> While kernel-bypass devices do eliminate the OS kernel from the I/O path, they do not handle the kernel’s most important job: offering higher-level abstractions

Well solarflare/openonload allow you to bypass the kernel and change literally none of your code via LD preload.

That can work, but if you want maximum performance, you need to use the ef_vi library, in openonload, that needs (a fair bit of) custom code. Exalink has libexanic, Napatech has their thing. libexanic is surprisingly elegant, and you can do a lot of the work (e.g. extracting a timestamp) while the rest of a packet arrives. Netronome has an eBPF way to allow you to run packet-handling code right in the NIC, maybe even freeing up a core.

Solarflare has ruled the roost, but Xilinx bought them out, and the future of their NICs is cloudy. Mellanox used to be a big deal; now they are part of NVIDIA. Mellanox and Solarflare (like Napatech) have spent a great deal of effort to make kernel bypass work for clients running in VMs.

Yea. Funnily enough I wrote a library in ef_vi for UDP sending with mixed results. Just UDP/MC dispatch and it was a little like grappling with an underdeveloped library. Pretty fast but I'm sure I hit some kind of memory barrier bug at one point.

The thing with onload is the sheer simplicity which makes it an easy sell. Also handy to tweak socket options with env vars. They also have a direct tcp lib if you need a some extra nanos. Templates sends too.

Not heard of netronome and never used libexanic.

These approaches mean you have to invest more in external passive monitoring tools as is stats are of little help.

What devices is it compatible with?