Hacker News new | ask | show | jobs
by omellet 3645 days ago
The problem isn't locking so much, it's that you have to dispatch to a kernel thread when you're requesting and sending data, paying the cost of that context switch every time. In userspace you can spin a polling thread on its own core and DMA data up and down to the hardware all day long without yielding your thread to another one.
1 comments

The kernel is mapped into the top of the address space of each user spaces process. That is generally pretty efficient which is why it is done.
sure, that saves you from dumping TLB state - but you still need to save register state, copy data from a user supplied buffer in to a kernel-owned device-mapped buffer - wiping L1 data and instruction caches in the process.

For 99% of use cases this isn't a problem, but if you're trying to save every possible microsecond, then it definitely does.

Sure, I was more commenting on the parent post that suggested that the cost was doing to a "context switch" when its not a context switch at all its mode switch - to "kernel mode."

If you are trying to save microseconds you are probably running special hardware like the SolarFlare network cards which also run the drivers in user space. These are generally hedge funds or high frequency trading shops. I can't imagine anyone else could justify the price.