| > linux futex can be implemented through the VDSO (can somebody correct me on this), so that eliminates the worse of the sycall costs. The semantic of a futex wait is a request to the kernel to put the thread to sleep until another thread issues a signal for the same "key". The trick is that the key is the address of a 32 bit memory location (provided to the kernel in both the signal and wait syscalls), which is normally the mutex address. So a process first tries to acquire the mutex as for a normal spin lock (with a CAS, exchange, xadd or whatever work for the specific algorithm) attempting to set the lock bit to 1; if it fails (possibly after spin-trying a few times), it invokes the futex wait. As you can see the syscall is only done in the slow path. On a futex release, the simple solution is to always invoke futex signal syscall after setting the lock bit to zero[1]. To fast path the relase, a wait bit can be set on the acquire path just before calling futex wait, so on the release path, when setting the lock bit to zero, signal would only be called if the waiter bit was set (and the cleared by together with the lock bit)[2]. As you can see the futex syscall is already the slow path and never need to be onvoked in the uncontented case. In fact the kernel doesn't even need to know about the futex untill the first contention. [1] futexes are edge triggered, same as condition variables, so a signal will only wakes any thread blocked ona wait that happened-before the signal call. Thus there is a race condition if a lock rrlease and signal happens between the failed acquire attempt and the wait call; to prevent this futex wait as an additional parameter that is the expected value of the memory location: the kernel checks the futex address against this valur and will only if it hasn't changed will put the thread to sleep (this is done atomically). [2] as there could me multiple waiters, a broadcast is actually needed here which leads to a thundering herd as all waiters will race to acquire the lock. The original futex paper used a wait count instead of just a bit but there are other options. |