| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rorhug 2740 days ago
	Would someone care to summarise the possible benefits of being able to do polled IO, without entering the kernel?

4 comments

farazbabar 2740 days ago

In traditional I/O, a hardware interrupt is triggered whenever data arrives at hardware boundary and the interrupt can get serviced by any core that is available to the scheduler. One can imagine how much overhead is involved in context switching whatever that core was doing before, setting up the registers, moving data and then relinquishing the core back to OS - in this model, dedicated cores serve I/O in a memory mapped ring buffer like data structure sized to your application needs. There is no allocation/deallocation overhead, no management beyond moving a pointer and no context switching. If you can spare the cores, this can significantly improve performance.

In one use-case, I was able to quadruple the performance on a 32 core xeon by installing 4 10gbps ethernet cards and dedicating the first eight cores to I/O (2 per interface). This is all about latency but with proper care, it also improves throughput.

link

barbecue_sauce 2740 days ago

Do you have to write your own software to do this, or can it be accomplished through OS configuration?

link

joatmon-snoo 2740 days ago

Someone more familiar with kernel workings than me should clarify, but my understanding is that IO generally happens via a syscall which requires the thread/process in question to context switch between userspace and kernel space, which can be very expensive. By enabling IO polling in userspace, you get to avoid that context switching.

link

int0x80 2740 days ago

Yes. Instead of using a syscall to get/issue events you use a mapped ring buffer. See https://lwn.net/Articles/743714/

link

geofft 2740 days ago

The motivating benefit is performance, but a side one the author mentioned on Twitter https://twitter.com/axboe/status/1073320502532263936 is sidestepping Meltdown and similar vulnerabilities from having the kernel and the OS in the same address space (even though they're separated by a privilege boundary). In a scheme like this, you can theoretically dedicate one core to the application and a separate one to the kernel, and minimize speculation, cache sharing, etc. between the two. The application and the kernel share a portion of memory, so the kernel doesn't ever run on the application's CPU.

This is questionably practical for a general-purpose machine, but for a server system used entirely as a hypervisor, or web server, or file aerver, or something, it might fit really well.

link

cyphar 2740 days ago

But wouldn't having the kernel pinned to a different core hurt performance due to NUMA, or through having to do lots of cross-calls?

link

geofft 2740 days ago

Depends on the use case; keep in mind that syscalls are slow, too. If you have an application that does significant computation on lots of data (think a scientific calculation/simulation), having another core on the same socket read ahead from disk to RAM might be much more efficient than pausing computation to read synchronously. Or if you're a file server that is just passing things back to the kernel's network layer, you might not even need to see the contents of RAM yourself.

link

snaky 2740 days ago

Performance, performance and performance.

link

sargun 2740 days ago

Performance, latency variability and sanity. It’s somewhat easier to write applications without having to make a syscall, which may have unknown latency (even if it’s a non blocking poll).

link

rurban 2740 days ago

Should be about 50x faster, based on my rough estimates

link