| HN Mirror

While there are ways to deschedule both userspace and kernel threads, there is no mechanism to deschedule a userspace thread while it's executing in the middle of kernel mode because of a blocking syscall.

Think of it like trying to deschedule a userspace thread in the middle of it having jumped to kernelspace to handle an interrupt. It just wouldn't work; that's not a pre-emptible state, not one that can be cleanly represented during a context switch with a PUSHA, not one where pre-emption would leave the kernel in a known state, etc.

So the CPU core is tied up because the original thread can't be descheduled, and instead would still be "stuck" in the middle of the system call, doing a busy-wait on the result of the callback. To make the callback actually happen in this hypothetical design, the execution of the callback would need to be scheduled onto another CPU core, using some system-global callback-scheduler like Apple's libdispatch.

Note that this is also why, in Linux, processes stuck in the D state are unkillable. They're stuck "inside" a blocking system call, and so cannot be descheduled, even by the process manager trying to hard-kill them (which, in the end, requires the system call to at least return to the kernel so that the kernel resources involved can reach a known postcondition state.)

And this is why innovations like io_uring make so much sense in Linux — they allow a userspace process to 1. make a long-running blocking syscall, while also 2. spawning a worker subprocess to communicate asynchronously with the logic inside the running syscall, by queuing messages back and forth through the kernel rings. (Picture, say, sendfile(2) messaging your worker to let you observe the progress of the operation, and/or to signal it on a channel to gracefully cancel the operation-in-progress.)