Hacker News new | ask | show | jobs
by kqueue 5797 days ago
Lets assume we have 20k opened FDs.

In case of poll(), you have to transfer this array of FDs from the userland vm to the kernel vm each time you call poll(). Now compare this with epoll() (let's assume we are using EPOLLET trigger), when you only have to transfer the file descriptors once.

You might say the copying won't matter, but it will matter when you have a lot of events coming on the 20k FDs which eventually leads to calling xpoll() at a higher rate, hence more copying of data between the userland and kernel (4bytes * 20k, ~80kbytes each call).

2 comments

Yep, that's what I thought too, that at least epoll would be as fast. Turns out it's not though, but then I could be wrong.

Also, your assumption of EPOLLET is potentially wrong. I think (unproven) that the extra overhead and complexity of using edge trigger right makes EPOLLET pointless.

Sorry, I meant level-triggered. :) I think edge-triggered does add an extra overhead as you stated.
Why would there be extra overhead when using edge triggered? There's definitely extra complexity on the client side, but it's close to what you're trying to do with super-poll (the extra complexity is basically to find out when an fd isn't busy anymore).

I think it might even be faster, kernel-side. From what I remember of the implementation, both modes have to walk the same list of ready fds, but that list is shorter in edge triggered mode, because they get removed from the list as it goes.

Edge triggered might have more overhead if many fds change between ready/not-ready quickly, but that's quite the wacky situation (and if it has an even distribution, would ensure your ATR is about 0.5, so probably still winning).

Why would there by any copying? The kernel can directly read userspace memory.
For the kernel to execute a system call, it has to place the arguments on its stack. a system call doesn't execute in the userland.
Yes but the argument to poll is a pointer. The pointer would be copied but the kernel can still follow the pointer to userspace, right?
The pointer referred to by the process is not accessible by the kernel because when the user process was running, it had a different vm space than the kernel vm space. So if it just passes the pointer (without copying the pointer's data), then the kernel will point to a virtual address that won't exist until the user process gets swapped in again.
This sounds really strange to me. The kernel has full access to the page tables so can't it lookup things in userspace?
When the kernel is executing a function call placed on the stack, all the addresses on the stack are assumed in the same vm space. It does not know that an address is actually a virtual memory address belonging to process X and tries to figure out the value in the physical memory.