| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jasonwatkinspdx 2516 days ago

When people say "lock the bus" they don't literally mean it's like there's a single bus and mutex.

Atomic operations execute atop the cache consistency protocol, which typically looks like: https://en.wikipedia.org/wiki/MOESI_protocol

It is indeed true that atomic operations will execute in a bounded time, and processors generally provide fairness guarantees as well.

2 comments

gpderetta 2516 days ago

Yes. Even better than that, atomic instructions are usually completely local to a core. I think that the only interaction with with the coherency protocol is that a core is guaranteed to be able to hold a cache in exclusive mode long enough to execute an RMW (and even that it is not really required, but useful to guarantee forward progress).

link

namibj 2515 days ago

Since NVLink2 and POWER9, even a GPU can issue atomics over the bus, which will be executed local to the CPU that owns this cacheline. This is very useful in high-contention write-heavy workloads, like atomic counters or accumulators.

link

amelius 2516 days ago

Yes, and the cache hierarchy ultimately depends on the memory bus. I suppose this bus, which may be shared with many other devices, doesn't always have bounded-time guarantee.

link

gpderetta 2516 days ago

Even to main memory there is not necessarily a single memory bus. Intracore or even intrasocket synchronization need not (and usually doesn't) go through main memory anyway.

link

amelius 2516 days ago

True, but some atomic instructions may need to access main memory to complete their operation. Whether shortcuts can be taken in most cases is not relevant for worst-case considerations.

link

bonzini 2516 days ago

They may need to access main memory, but the RMW operation don't happen over the memory bus. The processor appropriates the cache line just like any other memory access, and then operates atomically on the cache line.

link

amelius 2516 days ago

And what if the cache line is full/dirty?

link

bonzini 2516 days ago

The cache coherency protocol takes care of that. In other words the first part is just a memory load and can vary from 0 to a few hundred clock cycles, the second is local to the processor and has a more or less fixed cost. The worst-case execution time is completely dominated by the first part, the best case instead is dominated by the second.

link

jasonwatkinspdx 2516 days ago

I'd suggest reading the wiki articles about it for an introduction, and Ch 5 of https://www.amazon.com/Computer-Architecture-Quantitative-Jo... for a detailed understanding.

Right now you're asserting things about all this, while not being familiar with relatively basic aspects of how it works.

link

gpderetta 2516 days ago

Sure any memory access might access main memory. There is no special casing for atomic though.

link