|
|
|
|
|
by gpderetta
3597 days ago
|
|
On the other hand an unified (inclusive) L3 cache helps with maintaining cache coherency, which need to be explicitly handled in a non-unified design. I guess a big benefit of the separate caches is that if only half cores are in use, you can power half of it down, saving power and TDP. |
|
It also has a bandwidth problem. If 64 threads are vying for access, you either build it with few access ports and it gets choked, or you build it with many access ports which is costly in area, power, & speed.
Two separate peer caches automatically have twice the bandwidth of one similar double-size cache, for the price of NUMA & cache coherency challenges.
There is no one right answer here. Bandwidth is far more important and coherency much easier in a small L1; as you go down the hierarchy, bandwidth needs shrink and coherency is more expensive.