Hacker News new | ask | show | jobs
by rayiner 400 days ago
Granite Rapids is also a better example because it's an enterprise processor with a huge monolithic die (almost 600 square mm).

A key distinction, however, is latency. I don't know about Granite Rapids, but sources show that Sapphire Rapids had an L3 latency around 33 ns: https://www.tomshardware.com/news/5th-gen-emerald-rapids-cpu.... According to the article, the L2 latency in the Tellum II chips is just 3.8 ns (about 21 clock cycles at 5.5 GHz). Sapphire Rapids has an L2 latency of about 16 clock cycles.

IBM's cache architecture enables a different trade-off in terms of balancing the L2 versus L3. In Intel's architecture, the shared L3 is inclusive, so it has to be at least as big as L2 (and preferably, a lot bigger). That weighs in favor of making L2 smaller, so most of your on-chip cache is actually L3. But L3 is always going to have higher latency. IBM's design improves single-thread performance by allowing most of the on-chip cache to be lower-latency L2.