| I'm still confused coz this is exactly what I always thought the difference between epoll and select was. "what if, instead of the kernel telling us when something is ready for an action to be taken so that we can take it, we tell the kernel what action to we want to take, and it will do it when the conditions become right." The difference between select and epoll was that select would keep checking in until the conditions were right while epoll would send you a message. That was gamechanging. - I'm not really sure why this is seen as such a fundamental change. It's changed from the kernel triggering a callback to... a callback. |
io_uring: when any of these descriptors are ready, read into any one of these buffers I've preallocated for you, then let me know when it is done.
Instead of waking up a process just so it can do the work of calling back into the kernel to have the kernel fill a buffer, io_uring skips that extra syscall altogether.
Taking things to the next level, io_uring allows you to chain operations together. You can tell it to read from one socket and write the results into a different socket or directly to a file, and it can do that without waking your process pointlessly at any intermediate stage.
A nearby comment also mentioned opening files, and that's cool too. You could issue an entire command sequence to io_uring, then your program can work on other stuff and check on it later, or just go to sleep until everything is done. You could tell the kernel that you want it to open a connection, write a particular buffer that you prepared for it into that connection, then open a specific file on disk, read the response into that file, close the file, then send a prepared buffer as a response to the connection, close the connection, then let you know that it is all done. You just have to prepare two buffers on the frontend, issue the commands (which could require either 1 or 0 syscalls, depending on how you're using io_uring), then do whatever you want.
You can even have numerous command sequences under kernel control in parallel, you don't have to issue them one at a time and wait on them to finish before you can issue the next one.
With epoll, you have to do every individual step along the way yourself, which involves syscalls, context switches, and potentially more code complexity. Then you realize that epoll doesn't even support file I/O, so you have to mix multiple approaches together to even approximate what io_uring is doing.
(Note: I've been looking for an excuse to use io_uring, so I've read a ton about it, but I don't have any practical experience with it yet. But everything I wrote above should be accurate.)