Hacker News new | ask | show | jobs
by chc4 890 days ago
For the concurrent queue at the bottom, the issue is with the ABA problem where you're guaranteed to execute the atomic compare-and-swap correctly, but you aren't guaranteed that the same pointer value means the queue is in the same state: if you have a queue `123` and start a dequeue, you load current head = 1, next = 2, and then can be scheduled out for another thread to execute a series of operations resulting in a queue of `145`. When your thread resumes, the compare-and-swap of 1->2 works fine...but now the head of your queue is the freed 2 node. Woops! Concurrent primitives like RCUs or hazard pointers (or garbage collection!) are needed in order to safely free data under most lockfree data structures.
2 comments

I use a tag masked into the stored data to try prevent the ABA problem.

Every thread has its own unique tag.

  long original = me->realend;                                                                                                              
  int tag = (original & TAG_MASK);                                                                                                          
  changed = (((original & END_MASK) >> 32)) % me->size;                                                                                   
  long new = (data->thread_tag) | (((changed + 1) % me->size) << 32); 
Then compare and swap as usual. If another thread updates then they shall fail the compare and swap and we have to reloop and try again.

I am still learning TLA+ to write a model.

Yeah, this stuff is hard even before you bring memory ordering into the picture. What's worse, checks such as thread sanitizer will not catch it, since it's not a simple data race. Proving functional invariants about multithreaded code requires a different class of tools.
Funnily enough, ARM has another difference here on top of just having a non-TSO memory model: LL/SC atomics solve the ABA problem, because the word holding the queue head has been written to and the store-conditional fails, even though the contents of the memory will be the same at the end. Which makes sense once you say it and some docs about LL/SC will mention that, but also reading various lockfree data structure papers I've basically never seen talked about (probably because LL/SC progress guarantees are kinda scuffed)
the issue with LL/SC is that it is hard to expose to higher level languages than assembler. What you can do within an LL/SC section without causing it to spuriously fail is very much architecture dependent and you need full control of the load and stores within it. Exposing it to compiler optimizations won't work reliably.

So in practice LL/SC, in higher level languages, is used to implement CAS, XCHG and other atomic primitives which don't allow taking advantage of the ABA resistance. As an additional downside, you get a weak CAS that you always need to call in an loop.

It would be nice if compilers can lower this back to LL/SC if that’s what you actually wanted.
Tired: deadlocks

Wired: livelocks

Are there tools out there that can prove semantic invariants in multi-threaded code? I don't understand how there can be automated tools around it at all because how would that even be possible?
There are model checkers such as nidhugg (C++), dscheck (ocaml). They take a test case and reach all possible terminal states by trying different interleavings.

Crucially, they don’t have to try all interleavings to reach all terminal states, making the enumeration quite fast.

Rust comes to mind.
How would Rust solve this problem?
All I meant was that it "proves semantic invariants in multi-threaded code," which proves the concept.
No data races is just a very tiny subset of semantic invariants, though.