Hacker News new | ask | show | jobs
by Tuna-Fish 297 days ago
On modern CPUs atomic adds are now reasonably fast, but only when they are uncontended. If the cache line the value is on has to bounce between cpus, that is usually +100ns (not cycles) or so.

Writing performant parallel code always means absolutely minimizing communication between threads.

1 comments

Sure, but even the uncontended case is ~10x slower than regular ADD.