Hacker News new | ask | show | jobs
by gpderetta 1554 days ago
The standard does guarantee that all the operations on _Atomic variables are indeed atomic and, by default, sequentially consistent. Standard library functions like atomic_load can be used to specify more relaxed ordering.

As per intel docs, all aligned stores and loads are atomic with additional release and acquire semantics. XCHG (which is somewhat expensive) is used for SC stores, but plain loads still suffice for SC loads.

On other architectures load and stores are usually at least atomic, although with only relaxed ordering semantics.

Normally non-tearing + cache coherence is all that is required for relaxed atomic load/stores. You are confusing with general RMW that require specialized instructions on intel or ll/sc on RISCs (although ARM did add a bunch of specialized atomic instructions as well). To be pedantic, as far as I know, Intel doesn't lock the cacheline in any special way during an atomic RMW, it simply delays the read until all preceding stores have been flushed from the store buffer, then, if it has successfully acquired the line in exclusive mode, executes the load+store within whatever minimum exclusive cache hold period guaranteed by the coherence protocol. Acquiring a cacheline in exclusive mode is not specific to atomic RMWs but applies to any store and it is not really a lock as it can be taken away at any moment (i.e. the cc arbiter guarantees forward progress of the system as whole).