| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gmueckl 2509 days ago
	Epyc 2 has different memory latencies within and across NUMA nodes according to the infirmation I have. So it is not equally slow for all memory. Can you point me to a source that says otherwise? Edit: my source is this German article: https://www.heise.de/newsticker/meldung/AMD-Server-CPUs-Epyc...

2 comments

jdsully 2509 days ago

See the architecture diagram here: https://www.anandtech.com/show/14694/amd-rome-epyc-2nd-gen/2

Everything goes through the central crossbar on the I/O die, where Zen1 had memory attached directly to each CPU chiplet which would relay as necessary. On Zen1 if you accessed direct attached memory you wouldn't pay the latency penalty from relaying the data. In Zen2 all data is relayed via the I/O die with the associated delay that entails.

link

gmueckl 2509 days ago

I did some more digging. It seems like the Linux NUMA topology shown in the anandtech article is a deliberate lie. There are different latencies between cores and memory comtrollers on the same socket, but these are deemed to be insignificant enough to not expose them in the reported NUMA topology.

link

jdsully 2509 days ago

That is true with Intel chips as well. In the HFT space people actively work with Intel to determine which cores they should pin tasks to.

The speed of light is constant, and some cores will always be a little closer to various resources.

link

shaklee3 2509 days ago

That was true before Skylake, but is no longer true since they moved away from the multi ring architecture.

link

jdsully 2509 days ago

Even with the mesh the number of hops is variable based on which core is requesting and the physical geometry of the chip. The cores right beside the IMC will have the lowest latency. See this diagram: https://en.wikichip.org/wiki/intel/mesh_interconnect_archite...

The main improvement is the max number of hops is log(n) instead N/2.

link

wmf 2509 days ago

Epyc 1 was NUMA within the socket while Epyc 2 is officially UMA within the socket (although not really). Unfortunately Epyc memory latency is much higher than Intel so it's fair to call it uniformly slow.

link

Jweb_Guru 2508 days ago

Yeah, I actually was not so happy with the benchmarks because the memory access latency is not all that good... for most of the workloads that I care about, I don't know that the Epyc will be faster than a Xeon.

link