Hacker News new | ask | show | jobs
by viega 298 days ago
It quite does; the kernel is not the keeper of the lock, it only needs to detect the race condition that would result in a spurious sleep. It cares not one bit about the rest of your semantics.

It's true you could use it that way, but it's not the way it's meant to be used, defeating the purpose by requiring a system call even for uncontended locks.

1 comments

I think you're misunderstanding how futexes work, or else making an essentially irrelevant semantic argument around a definition for "keeper". The kernel is, 100%, absolutely, the "keeper" of that lock data for the duration of the system call. It knows that (hardware) address and matches it to any other such syscall from any other process on the system. And that requires tracking and intelligence and interaction with the VM system and arch layer up and down the stack.

It just doesn't "allocate" it on its own and lets the process use its own mapped memory. But to pretend that it doesn't have to do any work or that the memory is "separated" betrays some misunderstandings about what is actually happening here.

The kernel is responsible for maintaining the wait queues, and making sure that there is no race condition on the state that should preclude queueing.

It does not care how you use the queue, at all. It doesn't have to be done with a locking primitive, whatsoever. You absolutely can use the exact same mechanism to implement a thread pool with a set of dormant threads, for instance.

The state check in the basic futex is only done to avoid a race condition. None of the logic of preventing threads from entering critical sections is in the purview of the kernel, either. That's all application-level.

And most importantly, no real lock uses a futex for the locking parts. As mentioned in the article, typically a mutex will directly try to acquire the lock with an atomic operation, like an atomic fetch-and-or, fetch-and-add, or even compare-and-swap.

A single atomic op, even if you go for full sequential consistency (which comes w/ full pipeline stalls), is still a lot better than a trip into the kernel when you can avoid it.

Once again, I'm not saying you couldn't use the futex state check to decide what's locked and what's not. I'm saying nobody should, and it was never the intent.

The intent from the beginning was to separate out the locking from the waiting, and I think that's pretty clear in the original futex paper (linked to in my article).

I like to think of a futex as the simplest possible condition variable, where the predicate is just the state of the memory word (note that a mutex guarding the predicate is unnecessary since the word can be read and written atomically). It turns out that this is simple enough to implement efficiently in the kernel, yet expressive enough to implement pretty much any userspace synchronization primitive over it.
You are of course completely right. In fact sometimes I wish that the kernel would do slightly more with the memory location, like optionally reserving a bit to show the empty/non empty state of the queue: the kernel should be able to keep it up to date cheaply as part of the wait/wake operations while is more complicated for userspace.