Hacker News new | ask | show | jobs
by drewg123 3397 days ago
Very true, I should have mentioned that. At least for us, COD doesn't seem to impact our performance at all, while NUMA does. I'm hoping that Naples is the same for us.

However, there is an important difference. AMD seems to be putting multiple dies into the same package, whereas Intel seems to have (as the Cluster on Die name implies) everything on the same die. So my fear is that the interconnect between dies may not be fast enough to paper-over our NUMA weaknesses.

1 comments

Sounds like your application is latency sensitive, and not bandwidth sensitive, take a look at the graphs towards the end of this article:

https://www.starwindsoftware.com/blog/numa-and-cluster-on-di...

There's not much difference in memory bandwidth between crossing domains on the same die (COD) vs crossing domains system wide (accessing memory for a different socket). What kind of computation are you running?

I'm talking about Netflix CDN servers. The workload is primarily file serving. The twist is that we use a non-NUMA aware OS (FreeBSD).

We're not latency sensitive at all. The problem we run into with NUMA is that we totally saturate QPI due to FreeBSD's lack of NUMA awareness.

The results you link to don't match with what we've seen on our HCC Broadwell CPUs, at least with COD disabled. Though we only really look at aggregate system bandwidth, so potentially the slowness accessing the "far" memory on the same socket is latency driven, and falls away in aggregate.