|
|
|
|
|
by namibj
2513 days ago
|
|
Since NVLink2 and POWER9, even a GPU can issue atomics over the bus, which will be executed local to the CPU that owns this cacheline.
This is very useful in high-contention write-heavy workloads, like atomic counters or accumulators. |
|