| Which design faults do you think epoll specifically has? I know there are lots of file descriptors not usable with epoll or rather async i/o in general and that sucks (e.g. regular files). For networking, I find epoll/sockets nicer to work with than Windows' IOCP, because with IOCP you need to keep your buffers around until the kernel deems your operation complete. I think you have 3 options: 1) Design the whole application to manage buffers like IOCP likes (this propagates to client code because now e.g. they need to have their ring buffer reference-counted). 2) You handle it transparently in the socket wrapper code by using an intermediate buffer and expose a simple read()/write() interface which doesn't require the user to keep a buffer around when they don't want the socket anymore. 3) You handle it by synchronously waiting for I/O to be cancelled after using CancelIo. This sounds risky with potential to lock up the application for an unknown amount of time. It's also non-trivial because in that time IOCP will give you completion results for unrelated I/Os which you will need to buffer and process later. On the other hand, with Linux such issues don't exist by design, because data is only ever copied in read/write calls which return immediately (in non-blocking mode). |
In Linux, you don't, and userspace and kernelspace both end up doing unnecessary copying, and it means with shared buffers, something might poll and not-block, but actually block by the time you get around to using the buffer.
This is annoying, and it generally means you need more than two system calls on average for every IO operation in performance servers.
As a rule, you can generally detect "design faults" by the number of competing and overlapping designs (select, poll, epoll, kevent, /dev/poll, aio, sigio, etc, etc, etc). I personally would have preferred a more fleshed out SIGIO model, but we got what we got...
One specific fault of epoll (compared to near-relatives) is that the epoll_data_t is very small. In Kqueue you can store both the file descriptor with activity (ident) as well as a few bytes of user data. As a result, people use a heap pointer which causes an extra stall to memory. Memory is so slow...