Hacker News new | ask | show | jobs
by xirbeosbwo1234 1928 days ago
That's not quite accurate. Every core has access to the entire L3, including the L3 on an entirely different socket. CPUs communicate through caches, so if a core just plain couldn't talk to another core's cache then cache coherency algorithms wouldn't work. Though a core can access the entire cache, the latency is higher when going off-die. It is really high when going to another socket.

The first generation of Epyc had a complicated hierarchy that made latency quite hard to predict, but the new architecture is simpler. A CPU can talk to a cache in the same package but on a different die with reasonably low latency.

(I don't have numbers. Still reading.)

1 comments

In Zen1, the "remote L3" caches had longer read/write times than DDR4.

Think of the MESI messages that must happen before you can talk to a remote L3 cache:

1. Core#0 tries to talk to L3 cache associated with Core#17.

2. Core#17 has to evict data from L1 and L2, ensuring that its L3 cache is in fact up to date. During this time, Core#0 is stalled (or working on its hyperthread instead).

3. Once done, then Core#17's L3 cache can send the data to Core#0's L3 cache.

----------

In contrast, step#2 doesn't happen with raw DDR4 (no core owns the data).

This fact doesn't change with the new "star" architecture of Zen2 or Zen3. The I/O die just makes it a bit more efficient. I'd still expect remote L3 communications to be as slow, or slower, than DDR4.