| The kernel is responsible for maintaining the wait queues, and making sure that there is no race condition on the state that should preclude queueing. It does not care how you use the queue, at all. It doesn't have to be done with a locking primitive, whatsoever. You absolutely can use the exact same mechanism to implement a thread pool with a set of dormant threads, for instance. The state check in the basic futex is only done to avoid a race condition. None of the logic of preventing threads from entering critical sections is in the purview of the kernel, either. That's all application-level. And most importantly, no real lock uses a futex for the locking parts. As mentioned in the article, typically a mutex will directly try to acquire the lock with an atomic operation, like an atomic fetch-and-or, fetch-and-add, or even compare-and-swap. A single atomic op, even if you go for full sequential consistency (which comes w/ full pipeline stalls), is still a lot better than a trip into the kernel when you can avoid it. Once again, I'm not saying you couldn't use the futex state check to decide what's locked and what's not. I'm saying nobody should, and it was never the intent. The intent from the beginning was to separate out the locking from the waiting, and I think that's pretty clear in the original futex paper (linked to in my article). |