Hacker News new | ask | show | jobs
by elteto 2356 days ago
> checked in a tight loop by all waiters

This actually does not have to be this way. You could have a linked list of spinlocks, one for each waiter. Each waiter spins on its own, unique spinlock. When the previous waiter is done it unlocks the next spinlock, and so on. The implementation gets a bit complicated on non-GC languages, since there are races between insertion/removal on the linked list. If the number of threads is fixed and long-lived then it becomes easier, since instead of a linked list you can have an array of spinlocks.

Note: in some architectures (x86?) you could possibly do away with atomics, since (I believe) int updates are atomic by default. Not really sure though.

3 comments

Yep, that's the underlying principle behind Mellor-Crummey Spinlock: https://lwn.net/Articles/590243/.
Yes! The MCS spinlock, the name eluded me. The paper is actually a pretty good read (it is linked in the lwn article).
> you could possibly do away atomics, since (I believe) int updates are atomic by default.

The release can be a compiler only barrier followed by a simple store, but you do need an atomic RMW in the acquire path.

It is technically possible to implement a lock with just loads and stores (see Peterson lock[1]) even in the acquire path but it does require a #StoreLoad memory barrier even on Intel, which is as expensive as an atomic RMW so it is not worth it.

Edit:fixed barrier name typo.

[1] https://en.m.wikipedia.org/wiki/Peterson%27s_algorithm

> #StoreRelease memory barrier even on Intel

What is that? x86 is TSO...

Do you have an example of the full acquire and release sections in x86 assembly?

I'm sorry, I meant #StoreLoad.

Edit: here is a good (but long) article with a x86 Peterson Lock example (with an analysis of the critical race without the barrier):

https://bartoszmilewski.com/2008/11/05/who-ordered-memory-fe...

NetApp has a very interesting implementation RW spin lock inside the kernel. I tried optimizing to make it more fair for writers by introducing a single variable that would be check in each iteration.

The additional delay it added for checking the state resulted in deadlock timer triggering a dump! Hence, checking a variable is very expensive.

If you're interested in very efficient MCS style reader writer locks, check out this paper:

https://people.csail.mit.edu/mareko/spaa09-scalablerwlocks.p...