Hacker News new | ask | show | jobs
by gpderetta 2516 days ago
Yes. Even better than that, atomic instructions are usually completely local to a core. I think that the only interaction with with the coherency protocol is that a core is guaranteed to be able to hold a cache in exclusive mode long enough to execute an RMW (and even that it is not really required, but useful to guarantee forward progress).
1 comments

Since NVLink2 and POWER9, even a GPU can issue atomics over the bus, which will be executed local to the CPU that owns this cacheline. This is very useful in high-contention write-heavy workloads, like atomic counters or accumulators.