|
|
|
|
|
by p1esk
3397 days ago
|
|
Sounds like your application is latency sensitive, and not bandwidth sensitive, take a look at the graphs towards the end of this article: https://www.starwindsoftware.com/blog/numa-and-cluster-on-di... There's not much difference in memory bandwidth between crossing domains on the same die (COD) vs crossing domains system wide (accessing memory for a different socket). What kind of computation are you running? |
|
We're not latency sensitive at all. The problem we run into with NUMA is that we totally saturate QPI due to FreeBSD's lack of NUMA awareness.
The results you link to don't match with what we've seen on our HCC Broadwell CPUs, at least with COD disabled. Though we only really look at aggregate system bandwidth, so potentially the slowness accessing the "far" memory on the same socket is latency driven, and falls away in aggregate.