| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rnvannatta 2078 days ago
	Having more cache can potentially lower the speed of the cache, as the access time is limited by the time the longest path takes, the propagation delay. So there's a tradeoff between cache size and cache speed, which is why there are separate L1, L2, and L3 caches of various sizes. So potentially the L3 cache in this architecture could be slower than the L3 cache in the 3000 series. It could also be the same speed if the size was limited for other reasons, such as yield.

3 comments

alfalfasprout 2078 days ago

While this is true, in practice for the vast majority of applications this is a good tradeoff since the relative slowdown of L3 cache vs. the improvements in reductions of cache misses ends up being tiny:huge.

I'd expect the workloads that could suffer (all else equal) would be something like SIMD optimized matrix multiply where you're always able to prefetch the elements needed into cache effectively and memory access tends to be sequential. But those slight losses would likely be dwarfed by the improved core clocks, etc.

link

formerly_proven 2078 days ago

AMD claims a significant improvement in memory latency though, which is concordant with their large gains in gaming workloads (a 20 % general-purpose-throughput-oriented IPC increase alone would never give you a 20 % FPS increase in games).

link

rnvannatta 2078 days ago

A larger cache size would improve memory latency assuming the working set can utilize the full 36mb, which I'm sure the 2 games that had a 20% uplift can.

It's purely speculation but I suspect the cache size was limited by yield concerns rather than timing constraints. It looks like the 5600X has 1mb less cache so they probably engineered a way to disable faulty sections of the cache on a 1mb granularity.

Edit: My speculation's wrong. The cache difference between the 5700X and the 5600X is due to core count differences. It's the sum of the various cache sizes, and I misread the slide.

link

Sohcahtoa82 2078 days ago

> a 20 % general-purpose-throughput-oriented IPC increase alone would never give you a 20 % FPS increase in games

Is this true even for games that are CPU-bound? When I play MS Flight Simulator, enable the Dev toolbar, and look at the framerate monitor, it tells me that it's spending 20 ms of CPU time per frame, which causes my framerate to cap at 50 fps. A 20% increase in IPC would theoretically bring the frame time to 16.67 ms, giving me a cap of 60 fps.

link

nitrogen 2078 days ago

There was a now-deleted comment about the CPU busy-waiting for the GPU, to which I had this reply:

Reviews/benchmarks of Flight Simulator by e.g. Gamers Nexus show that Flight Simulator is heavily CPU limited, running on a single CPU thread.

link

flavius29663 2078 days ago

isn't the difference between L1, L2 and L3 also because of the functionality, not just the size+speed? L1 is data and code. L2 is data only, per core. L3 is data, synchronized between cores.

link

rnvannatta 2078 days ago

Yeah, technically there are 2 L1 caches; x86 is a 'Modified Harvard' architecture. The instruction cache also typically has to deal with caching micro-ops. I believe L2 and beyond store both instructions and data. There's also cache associativity, where the the same location in memory can be stores in one of N locations, which can differ per level. I think L1 caches are typically more associative because that takes extra silicon per byte. It looks like Zen 2 at least has an 8 way associative L1 cache.

link

labawi 2076 days ago

With smaller caches and especially L1 you typically want higher associativity.

With 32KB L1 at 64B line size, you can only cache 512 lines. Grouping them in larger buckets means less spillage as hot lines randomly end up in the same bucket.

link

vardump 2078 days ago

L2 and L3 are unified caches, they contain both data and code.

L1 code and L1 data caches are separate.

link