|
|
|
|
|
by dan-robertson
1368 days ago
|
|
It would be interesting to have a more detailed understanding of why these are the latencies, e.g. this repo has ‘clusters’ but there is surely some architectural reason for these clusters. Is it just physical distance on the chip or is there some other design constraint? I find it pretty interesting where the interface that cpu makers present (eg a bunch of equal cores) breaks down. |
|
So when you switch to mesh buses the interconnect takes up way more space. So one has to compromise between bus width and the amount of area one is using for the interconnects. Typically this means running reduced width buses around the mesh which limits core to core bandwidth. Not so much a big deal if you're running a server, more a problem though if you're trying to run interactively with a user. Unless of course you're Apple and just devote a truckload of die space to dump a fucking mammoth amount of interconnect between your dies.
There's also ancillary concerns as well like fabrication yield. For instance AMD runs chiplets probably because they can mix and match yields and they naturally segment the market. Get a CCX with 3 working cores? Pair it with another and you have a 6C/12T CPU. Get a CCX with 2 working cores? Pair it with another and you get a 4C/8T. Intel either gets a working die or they don't.
The problem here is the interconnect between the CCXs is relatively slow. Dog slow compared to the ring bus. Even running the Infinity Fabric's fclock at 1.8GHz only nets you 57.6GB/sec between CCXs and five times the latency of the ring bus. When you look at a Ryzen 3300 (2x2 CCX) and a Ryzen 3300X (1x4 CCX) the difference in performance is non-trivial and that's the Infinity Fabric dragging performance down. In comparison an Intel core's L3 cache on a 3GHz ring bus (i.e. non-turbo) pulls down at 96GB/sec. Sure you're still ultimately limited by DRAM but if stuff is staying in LLC it's a hell of a performance boost. In Zen 3 AMD even went to 8 core CCXs which gave the whole thing a huge performance boost. Part of that was because the smaller lithography gave them more area to play with so they could fit everything plus the interconnects onto the chiplet size they needed.
So yeah, I hope that little greatly oversimplified, surface level look was helpful.