| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ww520 2363 days ago

Modern day user-mode pthread mutex uses the futex kernel syscall [1] to implement the lock, which avoids the syscall in the non-contended case, so it can be very fast, acquiring the lock in the user-mode code entirely. I'm not sure whether the Rust's mutex API is a wrapper on the pthread mutex or calling the old kernel's mutex syscall directly.

Basically the user mode mutex lock is implemented as:

    // In user-mode, if the lock flag is free as 0, lock it to 1 and exit
    while !atomic_compare_and_swap(&lock, 0, 1)
        // Lock not free, sleeps until the flag is changed back to 0
        futex_wait(&lock, 0)  // syscall to kernel

When futex_wait() returns, it means the flag has been set back to 0 by the other thread's unlock, and the kernel wakes my thread up so I can check it again. However, another thread can come in and grab the lock in the meantime, so I need to loop back to check again. The atomic CAS operation is the one acquiring the lock.

[1] https://github.com/lattera/glibc/blob/master/nptl/pthread_mu...

Edit: the atomic_compare_and_swap can be just a macro to the assembly CMPXCHG, so it's very fast to acquire a lock if no one else holding the lock.

1 comments

bluejekyll 2363 days ago

https://github.com/rust-lang/rust/blob/master/src/libstd/sys...

Looks like it’s relying glibc for lock elision.

Mind you the parking_lot::Mutex in the article is not the stdlib implementation, documented here: https://docs.rs/parking_lot/0.10.0/parking_lot/type.Mutex.ht...

And that looks like it’s not using pthread, instead relying on primitives: https://github.com/Amanieu/parking_lot/blob/master/src/raw_m...

link

ww520 2363 days ago

Then it's using the futex implementation. It's very efficient.

link

bluejekyll 2363 days ago

I updated my comment. The fastest impl, parking_lot, appears to be a ground up atomic based mutex that doesn’t rely on pthread at all.

link

ww520 2363 days ago

It seems to be doing similar logic.

1. Does a CAS with compare_exchange_weak() at line 69.

2. Then call lock_slow() at line 72 to do spinlocking (Guh!).

3. The call to parking_lot_core::park() at line 256 seems to sleep wait.

link

acqq 2363 days ago

So this it acquires fully in userspace if there's no contention, it even spins if there is contention, and then if that wasn't enough it lets the thread sleep with the timeout.

Which matches the description of that library:

https://github.com/Amanieu/parking_lot

"This library provides implementations of Mutex, RwLock, Condvar and Once that are smaller, faster and more flexible than those in the Rust standard library"

"Uncontended lock acquisition and release is done through fast inline paths which only require a single atomic operation.

Microcontention (a contended lock with a short critical section) is efficiently handled by spinning a few times while trying to acquire a lock.

The locks are adaptive and will suspend a thread after a few failed spin attempts."

The only thing that I'm missing is how often the sleeping threads wake, as a bad constant can increase the CPU use.

link

temac 2363 days ago

Does it really sleep for a specified period instead of doing directed wake-ups? If so that's very far from ideal...

link

ww520 2363 days ago

Ah, thanks for the info.

I would imagine the unlocking call would wake up the sleeping thread so it won't keep waking up periodically.

The user-mode only implementation sounds interesting. The only downside is it won't work with inter-process locking.

link

AstralStorm 2363 days ago

How does it know that said critical section is short? Guessing?

Likely a hard linear-quadratic expanding threshold. They tend to waste a lot of CPU power in specific algorithms.

There are better, more intrusive ways of solving this problem on language level.

link

bluejekyll 2363 days ago

Yes. I just meant to say that it’s not a pthread based implementation, potentially making it more portable.

link

asveikau 2363 days ago

The P in pthread is the P from Posix which stands for portable.

Now, the "zero syscalls for uncontended pthread mutices via futex" optimization is Linux specific and may not be replicated elsewhere. Or it may. It's not Posix, but I know for instance win32 critical sections look a lot like spinlocks when not contended but do syscalls to block, which sounds a lot like a futex. So that would put that technique as dating to the 1990s at the latest. Futex landed in 2002.

link