|
|
|
|
|
by Tuna-Fish
297 days ago
|
|
On modern CPUs atomic adds are now reasonably fast, but only when they are uncontended. If the cache line the value is on has to bounce between cpus, that is usually +100ns (not cycles) or so. Writing performant parallel code always means absolutely minimizing communication between threads. |
|