Hacker News new | ask | show | jobs
by MaxGanzII 1554 days ago
An atomic write will need to be issued, and on the face of it, that C11 code is not issuing an atomic write. I could be wrong, but I would expect a particular function or macro to be used for an atomic write, and when it is not, you have normal writes.
1 comments

So all operations on _Atomic variables in C11 are implicitly atomic and sequentially consistent, which for an RMW like inc imply release plus acquire.
> So all operations on _Atomic variables in C11 are implicitly atomic

I may be wrong, but I would be shocked if this was so.

I think you may be mixing up non-tearing writes as opposed to atomic writes, which are performed using either cache-line locking (Intel) or exclusive reservation granules (everyone else) and as such require the employment of a specific mechanism to achieve (typically a macro or function, which leads to the use of the particular special instructions for atomic writes).

The overhead of an atomic write is very large, and it is often not needed. It would make no sense for every write (to an _Atomic type) to be atomic.

The standard does guarantee that all the operations on _Atomic variables are indeed atomic and, by default, sequentially consistent. Standard library functions like atomic_load can be used to specify more relaxed ordering.

As per intel docs, all aligned stores and loads are atomic with additional release and acquire semantics. XCHG (which is somewhat expensive) is used for SC stores, but plain loads still suffice for SC loads.

On other architectures load and stores are usually at least atomic, although with only relaxed ordering semantics.

Normally non-tearing + cache coherence is all that is required for relaxed atomic load/stores. You are confusing with general RMW that require specialized instructions on intel or ll/sc on RISCs (although ARM did add a bunch of specialized atomic instructions as well). To be pedantic, as far as I know, Intel doesn't lock the cacheline in any special way during an atomic RMW, it simply delays the read until all preceding stores have been flushed from the store buffer, then, if it has successfully acquired the line in exclusive mode, executes the load+store within whatever minimum exclusive cache hold period guaranteed by the coherence protocol. Acquiring a cacheline in exclusive mode is not specific to atomic RMWs but applies to any store and it is not really a lock as it can be taken away at any moment (i.e. the cc arbiter guarantees forward progress of the system as whole).