| The author's comments on cache sizes are a bit reductive. Not all "L3" is created equal, and designers always make tradeoffs between capacity and latency. In particular, the EPYC processors achieve such high cache capacities by splitting L3 into slices across multiple silicon dies, and accessing non-local L3 incurs huge interconnect latency - 132ns on latest EPYC vs 37ns on current Xeon [1]. Even DDR4 on Intel (90ns) is faster than much of an EPYC chip's L3 cache. Intel's monolithic die strategy keeps worst case latency low, but increases costs significantly and totally precludes caches in the hundreds of MB. Depending on workload, that may or may not be the right choice. [1] https://www.anandtech.com/show/14694/amd-rome-epyc-2nd-gen/7 |
Are there workloads where the AMD suffers due to its l3 design? Maybe, but I've not seen one yet. I would imagine something special like that you could try to arrange thread affinity to avoid non local l3 accesses.
On my 3900x L3 latency is 10.4ns when local.