Hacker News new | ask | show | jobs
by rnvannatta 2078 days ago
Having more cache can potentially lower the speed of the cache, as the access time is limited by the time the longest path takes, the propagation delay.

So there's a tradeoff between cache size and cache speed, which is why there are separate L1, L2, and L3 caches of various sizes. So potentially the L3 cache in this architecture could be slower than the L3 cache in the 3000 series. It could also be the same speed if the size was limited for other reasons, such as yield.

3 comments

While this is true, in practice for the vast majority of applications this is a good tradeoff since the relative slowdown of L3 cache vs. the improvements in reductions of cache misses ends up being tiny:huge.

I'd expect the workloads that could suffer (all else equal) would be something like SIMD optimized matrix multiply where you're always able to prefetch the elements needed into cache effectively and memory access tends to be sequential. But those slight losses would likely be dwarfed by the improved core clocks, etc.

AMD claims a significant improvement in memory latency though, which is concordant with their large gains in gaming workloads (a 20 % general-purpose-throughput-oriented IPC increase alone would never give you a 20 % FPS increase in games).
A larger cache size would improve memory latency assuming the working set can utilize the full 36mb, which I'm sure the 2 games that had a 20% uplift can.

It's purely speculation but I suspect the cache size was limited by yield concerns rather than timing constraints. It looks like the 5600X has 1mb less cache so they probably engineered a way to disable faulty sections of the cache on a 1mb granularity.

Edit: My speculation's wrong. The cache difference between the 5700X and the 5600X is due to core count differences. It's the sum of the various cache sizes, and I misread the slide.

> a 20 % general-purpose-throughput-oriented IPC increase alone would never give you a 20 % FPS increase in games

Is this true even for games that are CPU-bound? When I play MS Flight Simulator, enable the Dev toolbar, and look at the framerate monitor, it tells me that it's spending 20 ms of CPU time per frame, which causes my framerate to cap at 50 fps. A 20% increase in IPC would theoretically bring the frame time to 16.67 ms, giving me a cap of 60 fps.

There was a now-deleted comment about the CPU busy-waiting for the GPU, to which I had this reply:

Reviews/benchmarks of Flight Simulator by e.g. Gamers Nexus show that Flight Simulator is heavily CPU limited, running on a single CPU thread.

isn't the difference between L1, L2 and L3 also because of the functionality, not just the size+speed? L1 is data and code. L2 is data only, per core. L3 is data, synchronized between cores.
Yeah, technically there are 2 L1 caches; x86 is a 'Modified Harvard' architecture. The instruction cache also typically has to deal with caching micro-ops. I believe L2 and beyond store both instructions and data. There's also cache associativity, where the the same location in memory can be stores in one of N locations, which can differ per level. I think L1 caches are typically more associative because that takes extra silicon per byte. It looks like Zen 2 at least has an 8 way associative L1 cache.
With smaller caches and especially L1 you typically want higher associativity.

With 32KB L1 at 64B line size, you can only cache 512 lines. Grouping them in larger buckets means less spillage as hot lines randomly end up in the same bucket.

L2 and L3 are unified caches, they contain both data and code.

L1 code and L1 data caches are separate.