Hacker News new | ask | show | jobs
by namibj 2513 days ago
Since NVLink2 and POWER9, even a GPU can issue atomics over the bus, which will be executed local to the CPU that owns this cacheline. This is very useful in high-contention write-heavy workloads, like atomic counters or accumulators.