| Let's go to primary sources: https://developer.arm.com/documentation/ddi0487/latest The author is right. ARMv8 supports relaxed memory ordering but its memory model does not support acquire-release ordering without sequential consistency. (Update: Support for weaker acquire-release ordering was added in a later revision.) I'll quote a relevant excerpt from B2.3.11: Where a Load-Acquire appears in program order after a Store-Release, the memory access generated by the Store-Release instruction is Observed-by each PE to the extent that PE is required to observe the access coherently, before the memory access generated by the Load-Acquire instruction is Observed-by that PE, to the extent that the PE is required to observe the access coherently.
Sequential consistency is needed to avoid store/load reordering in cases like this: // Thread 1
flag1.store(1, SeqCst)
if flag2.load(SeqCst) == 0 {
// Guarded action
}
// Thread 2
flag2.store(1, SeqCst)
if flag1.load(SeqCst) == 0 {
// Guarded action
}
If these were instead implemented with acquire/release ordering as defined by the C++ or Rust memory model, the resulting happens-before constraints would not prevent both threads from executing their guarded actions.The excerpt from the Architecture Reference Manual says that if you use their load-acquire (ldar) and store-release (stlr) instructions, it is not possible for the store-release to be moved after the load-acquire, as observed by PEs (processing elements, their abstraction of hardware threads). Let's look at how C++ compilers implement acquire-release vs sequential consistency on x86 and ARMv8: https://godbolt.org/z/3fd5jse18 The machine code on ARMv8 is identical for thread_acq_rel and thread_seq_cst. Whereas on x86 the thread_seq_cst code has to use xchg (an alternative to store + mfence) to achieve sequential consistency. Update: shachaf pointed out that ARMv8 more recently added support for weaker acquire-release semantics in the ARMv8.3 revision. It looks like the first processor to ship with ARMv8.3 support was the A12X from Apple in 2018, which is 5 years after Herb's talk. If we take the code from before and compile for ARMv8 with all architectural features enabled, you will see different machine code for thread_acq_rel which uses the newer ldaprb instruction: https://godbolt.org/z/dnP9sebcz This illustrates a difficulty with talking about "ARMv8" as a fixed thing. It's much more of a rapidly moving target than x86. That said, the ARMv8.3 addendum should have been mentioned, at least parenthetically; I emailed the author suggesting an info box. |