| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by psykotic 1247 days ago

Let's go to primary sources: https://developer.arm.com/documentation/ddi0487/latest

The author is right. ARMv8 supports relaxed memory ordering but its memory model does not support acquire-release ordering without sequential consistency. (Update: Support for weaker acquire-release ordering was added in a later revision.)

I'll quote a relevant excerpt from B2.3.11:

    Where a Load-Acquire appears in program order after a Store-Release, the memory access generated by the Store-Release instruction is Observed-by each PE to the extent that PE is required to observe the access coherently, before the memory access generated by the Load-Acquire instruction is Observed-by that PE, to the extent that the PE is required to observe the access coherently.

Sequential consistency is needed to avoid store/load reordering in cases like this:

    // Thread 1
    flag1.store(1, SeqCst)
    if flag2.load(SeqCst) == 0 {
        // Guarded action
    }

    // Thread 2
    flag2.store(1, SeqCst)
    if flag1.load(SeqCst) == 0 {
        // Guarded action
    }

If these were instead implemented with acquire/release ordering as defined by the C++ or Rust memory model, the resulting happens-before constraints would not prevent both threads from executing their guarded actions.

The excerpt from the Architecture Reference Manual says that if you use their load-acquire (ldar) and store-release (stlr) instructions, it is not possible for the store-release to be moved after the load-acquire, as observed by PEs (processing elements, their abstraction of hardware threads).

Let's look at how C++ compilers implement acquire-release vs sequential consistency on x86 and ARMv8:

https://godbolt.org/z/3fd5jse18

The machine code on ARMv8 is identical for thread_acq_rel and thread_seq_cst. Whereas on x86 the thread_seq_cst code has to use xchg (an alternative to store + mfence) to achieve sequential consistency.

Update: shachaf pointed out that ARMv8 more recently added support for weaker acquire-release semantics in the ARMv8.3 revision. It looks like the first processor to ship with ARMv8.3 support was the A12X from Apple in 2018, which is 5 years after Herb's talk. If we take the code from before and compile for ARMv8 with all architectural features enabled, you will see different machine code for thread_acq_rel which uses the newer ldaprb instruction:

https://godbolt.org/z/dnP9sebcz

This illustrates a difficulty with talking about "ARMv8" as a fixed thing. It's much more of a rapidly moving target than x86. That said, the ARMv8.3 addendum should have been mentioned, at least parenthetically; I emailed the author suggesting an info box.