Hacker News new | ask | show | jobs
by ajamesm 3373 days ago
I don't like usage of the term "lock free" to mean "locks that are less costly".

There's a difference between (A) locking (waiting, really) on access to a critical section (where you spinlock, yield your thread, etc.) and (B) locking the processor to safely execute a synchronization primitive (mutexes/semaphores).

CAS is "lock free" only in the sense that it doesn't require the processor to stop the world in order to flip the mutex boolean. It's still a mutex, and it still gates access to a critical section, and you still need some kind of strategy to deal with waiting for the critical section to become available (e.g., spinlocking, signaling the OS to sleep thread execution).

  An example of operation using coarse locks would be a method in Java with “synchronized” keyword.
  If a thread T is executing a synchronized method on a particular object,
  no other concurrent thread can invoke any other synchronized method on the same object.
That's just putting one giant mutex around all access to the object. Having finer granularity != "lockless".
5 comments

> CAS is "lock free" only in the sense that it doesn't require the processor to stop the world in order to flip the mutex boolean

No, CAS is lock free in the sense that a process or thread that freezes for whatever reason while doing a CAS operation cannot cause the entire system to stop making progress (provided your hardware is working as intended).

That's the only thing that matters to say if something is 'lock free' or not, and CAS meets that definition.

If you want some other property that's fine, but that's not what anyone understands by 'lock free'.

Right, I think you looked past what I said, and why I put "lock free" in quotes:

> I don't like usage of the term "lock free" to mean "locks that are less costly".

There's a difference between being formally "lock free" meaning "the system is guaranteed to not globally deadlock", and this informal use of "lock free" meaning "CAS is more efficient because it doesn't block execution of other threads in obtaining a mutex"

This article's intro:

  Data structures (and their corresponding methods) implemented with coarse grained locks
  are hard to scale in highly parallel environments and workloads where the structure is required
  to be accessed by several concurrent threads simultaneously (parallelism).
is talking about performance, not termination guarantees, and that muddles the meaning of "lock free".
> CAS is "lock free" only in the sense that it doesn't require the processor to stop the world in order to flip the mutex boolean. It's still a mutex, and it still gates access to a critical section, and you still need some kind of strategy to deal with waiting for the critical section to become available (e.g., spinlocking, signaling the OS to sleep thread execution).

CAS can/"must" be used to implement normal mutex locks, but this definitely isn't it's only use. The only way you could view CAS itself as locking something is at a low cache-coherency level, where it requires the processor to lock a cache line (https://en.wikipedia.org/wiki/MESI_protocol ); however, this is at a level that isn't semantically observable: it is atomic with respect to things like OS thread scheduling, unlike mutexes. In some sense, this is the critical reason that CAS and lock-free programming is interesting, as it avoids problems like priority inversion, and gives the guarantee of at least one thread always making progress.

> That's just putting one giant mutex around all access to the object. Having finer granularity != "lockless".

Is this a comment about the article, or just a general statement? Because the article doesn't propose finer granularity locks as a lockless solution.

The linked list discussed in the article forgoes mutexes, but in exchange, operations aren't guaranteed to succeed:

  Because we are no longer taking explicit locks,
  there is no guarantee that insert operation will succeed.
  Therefore, CAS based algorithms are usually implemented
  using a loop (aka CAS loop) to retry the operation when
  CAS fails.
So you're still getting spinlocks (or context switches, etc.) assuming these inserts have to eventually succeed. Implicitly, the CAS does "lock" in the sense of forcing other accesses into retry loops. You're still dealing with the problems inherent to your waiting strategy. This data structure doesn't prevent problems like starvation. In the pathological case, you'll get stuck forever in a spinlock trying to insert.

> In some sense, this is the critical reason that CAS and lock-free programming is interesting

Right. The article's preface criticizes implementations of mutexes in C++ and Java as unscalable, which is certainly a problem, but categorically a different concern from that of designing lock-free algorithms or data structures.

One thread possibly being starved is very different to all threads being starved, as can happen with blocking algorithms (the classic example being the thread holding a lock is descheduled: no-one needing it can make any progress). Atomic instructions like CAS do not inherently have this problem, and can be used in ways to avoid it (such as how the article describes it), but the same is not true of mutexes.

I personally find it makes sense think of this is more of a transactional thing (one thread invalidates the transactions of others), rather than a mutex-esque critical section.

In any case, the article (accidentally or otherwise) does actually use the correct technical term for code where some threads may be starved while others make progress (lock-free). If it was trying to discuss code that avoid this, it would be "Notes on Wait Free Programming", and would likely be significantly longer.

  In this post, I will summarize my understanding of a non-blocking (lock free) design of linked list data structure...
The author conflates blocking with lock-free. Starvation is a form of blocking, non-blocking implies no starvation.
It isn't a form of blocking, and it doesn't imply that, in the conventional/widespread definitions of these terms: "Lock-freedom allows individual threads to starve but guarantees system-wide throughput" https://en.wikipedia.org/wiki/Non-blocking_algorithm#Lock-fr... . The subclassification of non-blocking that avoids starvation is wait-free ("Wait-freedom is the strongest non-blocking guarantee of progress, combining guaranteed system-wide throughput with starvation-freedom", from the Wikipedia article above).
In that case, I guess we're just drawing from different glossaries. I've always heard "blocked" used to mean "can't execute because of something other threads are doing" regardless of whether that's starvation or getting slept by a mutex.
> So you're still getting spinlocks (or context switches, etc.) assuming these inserts have to eventually succeed.

While CAS can be used to implement spinlocks (and we agree this certainly isn't lock free), I would not label all uses of CAS with loops as spinlocks. If it's impossible for another thread "A" to fail (or otherwise suspend) holding a mutex in a 'locked' or other intermediate state, such that no other thread can continue their desired execution while that thread "A" is suspended - coinciding with e.g. the possible dangers of deadlocks - it's not a 'lock' as I'd use the term.

Stretch the term "locking" much farther and then you can start getting into philosophical debates about just how "lock free" is an atomic increment? CPU cache coherency protocols between cores amount to something rather similar to locking when you get right down to it... at this point, you abandon "lock free" as a useful term (and should instead use terms like "share nothing", "single threaded", or "buggy" where you previously used the term "lock free".)

> You're still dealing with the problems inherent to your waiting strategy. This data structure doesn't prevent problems like starvation. In the pathological case, you'll get stuck forever in a spinlock trying to insert.

Potentially, yes. "Lock free" is not contention free. It is a different strategy for managing contention. It is not "cheaper locking" - throwing a mutex and locks at the problem can be cheaper than using "lock free" algorithms. Serialized access to a resource can be faster than two cores false sharing, constantly invalidating each other's cachelines.

FWIW, this seems to line up with the definitions on https://en.wikipedia.org/wiki/Non-blocking_algorithm , which suggests the term "wait free" to describe options that cannot deadlock or livelock (e.g. all threads have a guaranteed upper bound of operations, within which they'll make progress, regardless of the failure or suspension of other threads accessing the resource.)

> This data structure doesn't prevent problems like starvation

But that's fine! Lock-free permits starvation. Not every data structure can solve every problem. If you can't have starvation then lock-free isn't enough. In which case, this article isn't for you.

You realize you cherry-picked a snippet of my post, and then made the exact point I just made, which is:

There aren't deadlocks, but there is still waiting and potential starvation, all of which can be informally called "locks" and so we need to be careful about qualifying "locking".

This isn't a criticism of lock-free methodology, but a consideration of the ways in which "lock-free" could be interpreted.

> all of which can be informally called "locks"

> the ways in which "lock-free" could be interpreted

This is only a problem if you use a definition of the 'lock' in 'lock-free' that isn't the same as everyone else's.

I guess I must be misunderstanding what you're trying to say if that isn't it. I'll stop criticising.

Other commenters have provided great responses. But to add my voice to the mix: CAS is definitely not a mutex. A mutex, by definition of the term, enforces mutual exclusion. CAS cannot do that because it is a synchronization primitive. You can use CAS to implement a mutex. Or you can use CAS to implement synchronization which does not require mutual exclusion. Your synchronization primitive does not determine if your approach is lock-based or not. It's what you do with that synchronization primitive that determines if your approach is lock-based or not.

(Here, I use "synchronization primitive" to mean the processor instruction you use for synchronization. A construct such as a mutex or a spinlock is something you would build with a synchronization primitive.)

I've always understood lock-free to mean that every worker is guaranteed to make progress in a (finite) bounded amount of time.
I like the terminology that Dmitry Vyukov uses[0], where "wait-free" means that any individual thread is guaranteed to make progress, and "lock-free" means that some thread is guaranteed to make progress. There's also the weaker guarantee "obstruction-free" where livelock is a possibility.

[0] http://www.1024cores.net/home/lock-free-algorithms/introduct...

(It's worth noting that those are the standard terms for each of those concepts: https://en.wikipedia.org/wiki/Non-blocking_algorithm )
Couldn't there be an even stronger requirement like "obstruction-free"? I could imagine code that is "obstruction free" but which severely slows down execution for itself or other threads.
These are theoretical concepts. If there's a bound on the slowdown, it's lock-free. If there's no bound, it's obstruction free.

These theoretical concepts might or might not be that relevant for real-world applications. For example, in real-time audio (the main domain of interest to me for lock-free techniques), what you really care about is that the audio rendering thread will produce one buffer worth of sound before the hardware finishes outputting the previous buffer. This sounds like a match for "wait-free", but it depends on the details. If the non-audio thread can cause a 100x slowdown (but no more) that would technically be wait-free, but you'd miss your buffer, and so it's not acceptable. Another system might use mutexes, but be carefully engineered so that the locks are never held for more than 100µs (say, using locks designed to handle priority inversion), and thus reliably meet the needs even if it's not technically anything like lock-free.

But, as with anything, these concepts are useful tools, and I _love_ lock-free algorithms for audio.

Obstruction-free is the weakest requirement, so your use of "even stronger" confuses me.

In any case, in practice wait-free has this property a bit: getting the guarantee of all threads making progress in finite time generally requires in them executing slower individually, on average, e.g. one common strategy is for thread A help thread B finish its work, because thread A needs that result.

Executing a little slower would be fine. Executing two orders of magnitude slower can be almost as useless as being obstructed.
I've only ever heard that with "lock" being qualified as a deadlock, not a synchronization primitive.
Non-Blocking works a bit better as a phrase, in my opinion.
Oh god, "non-blocking" is even worse, no thanks to Javascripters. Lock-free data structures can wait, and can block if they deal with failed operations by spinlocking.

The muddled concepts are: waiting - what you do until you eventual acquire a contested resource (spinlock) mutexes - objects used to enforce critical sections stopping the world - a processor stopping the execution of all threads in order to ensure proper order of thread-critical instructions (is there a formal name for this?) CAS - an atomic primitive often used to avoid the overhead of stopping the world to enter critical sections, or to avoid deadlock by implementing algorithms that do not have critical sections critical section - a routine that must be executed by only one thread at a time blocking - being unable to proceed with a given routine on account of another thread non-blocking / async - threading in general, or, instead of spinlocking, yielding execution to another thread deadlocking - a systematic failure wherein there is a cyclical dependency of threads on resources, and no thread can progress

All of these can be informally referred to by "locking", and lock-free (formal definition) algorithms avoid deadlocking by avoiding having critical sections or locks on resources at all. They're generally more performant, but that's coincidental.

If you have a critical section, and you implement a mutex using CAS, you've probably sped up your application because you no longer have to pause other threads. That's "lock free" in the sense that "stopping the world" is "locking" other threads, but not "lock free" in the sense of avoiding deadlock. If a thread enters a critical section with a CAS mutex and then waits forever, you can get deadlock.

You keep using terms in a non standard way. Please don't cinfuse matters.

A lock free data structure that deal with failed operations with a spin lock is not lock free.

A mutex is not lock free full stop.