Hacker News new | ask | show | jobs
by gpderetta 895 days ago
IMHO that the graph in the Memory Barrier section is misleading [1]. It has the barriers spanning across threads, but that's not the right mental model. Something like this is more correct (note the additional barriers after the store and before the load to match seq_cst semantics):

  Thread 1                Memory                  Thread 2
  ---------               -------                 ---------
  |                          |                          |
  |   write(data, 100)       |                          |
  | -----------------------> |                          |
  |                          |                          |
  | ====Memory Barrier====== |                          |
  |   store(ready, true)     |                          |
  | -----------------------> |                          |
  | ====Memory Barrier====== |                          |
  |                          |                          |
  |                          | ===Memory Barrier======= |
  |                          |   load(ready) == true    |                   
  |                          | <----------------------  |
  |                          | ====Memory Barrier=====  |
  |                          |                          |
  |                          |       read(data)         |
  |                          | <----------------------  |
  |                          |                          |
I.e. barriers prevent reordering of operations within a thread, not across threads. It also makes immediately obvious why the seq_cst ordering of both the thread 1 atomic store and the thread 2 atomic load can be relaxed: The last barrier in Thread 1 does not prevent any reordering in this example, hence it can be omitted, leaving only the barrier before the store making it a release operation. Similarly, we can omit the barrier before the first load in thread 2, leaving only the barrier after, making it an acquire operation.

[1] well, it is showing the effect of sequential consistency as opposed to acquire-release, so a logical barrier spanning threads is not necessarily wrong, but then you would still need to show a barrier before the last store and the first load.

1 comments

Hey, so I'm curious why having memory barriers span across threads is the wrong mental model.

Assuming that the memory barrier is syncing across a single variable (in this case ready), why would it be correct to think of it as two separate barriers? If it were correct to think of it as two separate barriers on two separate threads, wouldn't there need to be some form of synchronization or linkage between the two barriers themselves so that memory barriers can be coupled together?

For instance, if I had release-acquire models on two variables, ready and not_ready, using separate barriers as representation might look something like this

```

  Thread 1                Memory                  Thread 2
  ---------               -------                 ---------
  |                          |                          |
  |   write(data, 100)       |                          |
  | -----------------------> |                          |
  |                          |                          |
  | ====Memory Barrier====== |                          |
  |   store(ready, true)     |                          |
  | -----------------------> |                          |
  | ====Memory Barrier====== |                          |
  |                          |                          |
  | ====Memory Barrier====== |                          |
  |   store(not_ready, true) |                          |
  | -----------------------> |                          |
  | ====Memory Barrier====== |                          |
  |                          |                          |
  |                          | ===Memory Barrier======= |
  |                          |   load(ready) == true    |                   
  |                          | <----------------------  |
  |                          | ====Memory Barrier=====  |
  |                          |                          |
  |                          |.===Memory Barrier======= |
  |                          |   load(not_ready) == true|                   
  |                          | <----------------------  |
  |                          | ====Memory Barrier=====  |
  |                          |                          |
  |                          |       read(data)         |
  |                          | <----------------------  |
  |                          |                          |
```

Now, how does the processor know which memory barriers are linked together? I ask because without understanding which barriers are linked together, how is instruction re-ordering determined?

The linking of barriers in pair is really just a mental model, not (usually) what happens at the hardware level. In fact in the C++ memory model the synchronizes-with relationship is load and stores, not barriers, which indirectly affect the properties of load and stores around them. That's another reason why I don't really like the memory barrier model and I prefer to think in terms of happens-before dependency graphs.

edit: AFAIK, seq_cst ordering (as opposed to acq_rel) is only relevant when you have more than two threads and you care about things like IRIW. In this case acquires and releases are not enough to capture the full set of constraints, although at the hardware level it is still everything local.

edit2: I guess the missing bit is that beyond the hardware fences you have the hardware cache coherency protocol that makes sure that a total order of operations always exist once load and stores reach the coherence fabric.

Yeah, I see your point about thinking in terms of dependency graphs. I actually got the idea for using a visual memory barrier from the Linux docs(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...) and the C++ concurrency in action book.

>I guess the missing bit is that beyond the hardware fences you have the hardware cache coherency protocol that makes sure that a total order of operations always exist once load and stores reach the coherence fabric.

Can you explain more about this?

Modern processors are out-of-order execution beasts. A barrier within a thread serves to enforce some ordering within that thread - that a store will occur after another store, and that a load will occur before another load. Threads know nothing of each other.