|
|
|
|
|
by xirbeosbwo1234
1928 days ago
|
|
That's not quite accurate. Every core has access to the entire L3, including the L3 on an entirely different socket. CPUs communicate through caches, so if a core just plain couldn't talk to another core's cache then cache coherency algorithms wouldn't work. Though a core can access the entire cache, the latency is higher when going off-die. It is really high when going to another socket. The first generation of Epyc had a complicated hierarchy that made latency quite hard to predict, but the new architecture is simpler. A CPU can talk to a cache in the same package but on a different die with reasonably low latency. (I don't have numbers. Still reading.) |
|
Think of the MESI messages that must happen before you can talk to a remote L3 cache:
1. Core#0 tries to talk to L3 cache associated with Core#17.
2. Core#17 has to evict data from L1 and L2, ensuring that its L3 cache is in fact up to date. During this time, Core#0 is stalled (or working on its hyperthread instead).
3. Once done, then Core#17's L3 cache can send the data to Core#0's L3 cache.
----------
In contrast, step#2 doesn't happen with raw DDR4 (no core owns the data).
This fact doesn't change with the new "star" architecture of Zen2 or Zen3. The I/O die just makes it a bit more efficient. I'd still expect remote L3 communications to be as slow, or slower, than DDR4.