|
|
|
|
|
by lallysingh
4762 days ago
|
|
Specifically: "in multicore multisocket machines, there is often a tradeoff between optimizing NUMA performance by clustering threads close to the memory nodes to increase the amount of local accesses and optimizing for cache performance by spreading threads to reduce the cache contention" I.e. the performance benefit from socket-local memory accesses may not be worth having all the threads using that memory on that socket's CPUs, because they'll each get too little a share of the cache. |
|