Hacker News new | ask | show | jobs
by dcolkitt 2363 days ago
> Second, the uncontended case looks like > Parking_lot::Mutex avg 6ms min 4ms max 9ms

This estimate is way too high for the uncontested mutex case. On a modern Linux/Xeon system using GCC, an uncontested mutex lock/unlock is well under 1 microsecond.

I have a lot of experience here from writing low-latency financial systems. The hot path we use is littered with uncontested mutex lock/unlock, and the whole path still runs under 20 microseconds. (With the vast majority of that time unrelated to mutex acquisition.)

The benchmark used in the blog post must be spending the vast majority of its time in some section of code that has nothing to do with lock/unlock.

3 comments

You're misreading the benchmark, that's 6ms for 10,000 lock/unlocks per thread, 320,000 lock/unlocks total. In other words 0.6 microseconds per thread per lock.
That's still unreasonably high, isn't it? Even a Go sync.Mutex, not exactly a hot-rod implementation, can be acquired and released in < 50ns on the garbage hardware I have before me.
On Intel (and probably very similar on AMD) the cost of a completely uncontented, cache hit, simple spin lock acquisition is ~20 clock cycles while the release is almost free.
This is the time for whole benchmark run, not for an individual lock/unlock. The article is quite clear on that.
As we say in low-latency finance, "a microsecond is an eternity."

If you have threads interacting, whether via mutexes or spinlocks, you have a high-latency system.