Hacker News new | ask | show | jobs
The cost of a system call [pdf] (cs.cmu.edu)
126 points by caustic 3745 days ago
10 comments

The FlexSC paper from 2010.

They observe the cache damage from traditional system calls and propose batch queueing them and ideally using a different core to service them. This is not the traditional Unix programming model, so they create a threading package the transparently makes your traditional Unix synchronous system calls work. They benchmark Apache with all of this new apparatus and it performs very well.

A callback-oriented kernel call mechanism. Hmm. The callback-oriented framework people should love this. It looks like you have to keep polling the shared page to see when your system call is done, though.

It's painful to realize that, after a context switch, modern CPUs can need 11,000 cycles to get back to full speed, with the right stuff in the caches and pipelines. Maybe we need CPUs which handle context switches better.

A lot of that is dumping cache and then trying to refill it after the context switch. Regarding context switches, one of the things that they suggested is to pin one core to stay in the system context and just handle servicing syscalls, etc. Given a modern server can have up to a few hundred logical cores, that's not as big of a thing to ask for as it was even a few years ago. Even "cheap" servers these days have 8-16, so pinning there might even make sense as well.
That's what I did in some of my designs. It was more about covert channel mitigation by ensuring the secrets and untrusted stuff used seperate CPU's. Side benefit was performance benefit of less cache flushes. It works.
This isn't something I've used, but the low latency people on the Mechanical Sympathy mailing list seem to talk a lot about pinning cores to particular processes. This paper seems to have a lot of the same underlying considerations.
Huh? On x86, a syscall/sysret pair takes a couple hundred cycles, doesn't serialize, and doesn't zap caches.

It's the other ugly state changes that hurt a lot. Switching address spaces burns a few hundred cycles and zaps the TLB (fix coming in Linux 3.7, maybe). Interrupts are a few thousand cycles.

It is something to take into account if you're utilizing PV based virtualization in Xen, however - system calls will require a context switch.
x86 already somewhat does its just not implemented in any OS kernel as of now. The kernel calling the program (async syscalls) has been mainstream within exokernels since their inception.
I found this paper incredibly interesting, and I think I'd love to work in research in this area. Does anyone have some resources to learn more/keywords to search? I'm currently a biomedical engineering undergraduate, so the most relevant course I've had has been Digital Logic, which I absolutely loved and did very well in, but I'd really appreciate advice on additional courses to try to take.
On the most basic level, you should be able to write a simple operating system from scratch. I heard http://pages.cs.wisc.edu/~remzi/OSTEP/ is good for an introduction to OS writing.

Some papers I read in no particular order:

Synthesis OS (http://valerieaurora.org/synthesis/SynthesisOS/) might be interesting for you. They do lots of runtime code synthesis.

Exokernels (follow links from https://en.wikipedia.org/wiki/Exokernel#Bibliography). And more recently Mirage (https://mirage.io/) and HaLVm (https://github.com/GaloisInc/HaLVM)

(I assume you already know how to program. Otherwise, brush up on that as step 0. C is still the canonical choice for OS work. But if you are feeling adventurous there's more choice.)

The interesting bit I was eagerly anticipating is buried way down in "3.1 Exception-Less Syscall Interface." How'd they do it? Syscall pages. This sounds really interesting, though apparently not terribly new.

I'd be really concerned about trust issues, but I'm sure it could be done safely. Lots of room for corner cases, especially w/NUMA.

We should just let V8 run in the kernel and do away with system calls. https://www.destroyallsoftware.com/talks/the-birth-and-death...
This is cool but it certainly makes things more complicated. It adds a kernel mode thread per process.
I think you can do it with just one kernel mode thread for all processes, using one (or more) pages of memory per process/thread. The kernel process can read all the pages, but pages can only be read by their respective processes.

It looks like this is not what the article's implementation does, but I think it would be possible.

This is fantastic. Any plans of getting this into main stream Linux kernel?
www.google.com/patents/US20140149781
RedHat has the patent... thats good, right?
How is that valid, if the paper and implementation are from 2010?
The seems like the kind of thing you'd expect to see implemented in Redox.[0]

[0] http://www.redox-os.org/

How so? This talks about changing how syscalls work (batching, callbacks, pinning the syscall handler to one core) but it doesn't look like redox does anything special with syscalls (https://doc.redox-os.org/doc/kernel/syscall/index.html)

> The system call interface is very similar to POSIX's system calls

iirc xok exokernel had the same thing by doing 'scheduler activations' through a vdso. How does flexsc handle cancellation points? What about latency?
TempleOS is ring-0-only. No system call overhead.