Hacker News new | ask | show | jobs
by p1esk 3397 days ago
Sounds like your application is latency sensitive, and not bandwidth sensitive, take a look at the graphs towards the end of this article:

https://www.starwindsoftware.com/blog/numa-and-cluster-on-di...

There's not much difference in memory bandwidth between crossing domains on the same die (COD) vs crossing domains system wide (accessing memory for a different socket). What kind of computation are you running?

1 comments

I'm talking about Netflix CDN servers. The workload is primarily file serving. The twist is that we use a non-NUMA aware OS (FreeBSD).

We're not latency sensitive at all. The problem we run into with NUMA is that we totally saturate QPI due to FreeBSD's lack of NUMA awareness.

The results you link to don't match with what we've seen on our HCC Broadwell CPUs, at least with COD disabled. Though we only really look at aggregate system bandwidth, so potentially the slowness accessing the "far" memory on the same socket is latency driven, and falls away in aggregate.