| "The problem with adding a file descriptor to a per-thread event loop’s epoll/kqueue set is that it’s just not optimal if you want to continually ensure you’re saturating your hardware (either CPU cores or I/O)" Both epoll and kqueue permit multiple threads to poll the same event set. Normally you do this in tandem with edge-triggered readiness (EPOLLET on Linux, EV_CLEAR on BSD) so that only one thread will dequeue an event. How do think IOCP is implemented in Windows? There's a thread pool in the kernel which _literally_ polls a shared event queue. It's just hidden so you can pretend it's magical. But conceptually it works almost identically to how you would do it in Unix. The benefit of IOCP is that it's a native API. It's warts and shortcomings notwithstanding, developers never even need to think about how it's actually implemented. Whereas with epoll and kqueue you either have to roll your own framework, or select from various third-party options. Seeing how the sausage is made can turn some people off. But just because you don't see the gory details doesn't mean it's implemented using magical fairy dust. There's much to recommend Windows, and many things the NT kernel conceptually gets right. But IOCP vs polling? The only real difference architecturally is how much of the stack sits in user-space vs kernel-space, and how much of the stack is delivered by Microsoft (all of if in the case of IOCP) vs other sources (in Linux, glibc does AIO, while all the event loop and callback code is provided by various libraries or written yourself). Putting more of the stack in kernel-space doesn't magically make it easier to perform optimizations. That's marketing speak and kernel fetishism. You have to first show why those optimizations can't be achieved elsewhere, like in the I/O or process scheduler. Various Linux components traditionally are more performant (e.g. process scheduling) than in Windows, so many of the optimizations wrt IOCP is arguably clawing back performance lost elsewhere in the system. |