|
|
|
|
|
by sliverstorm
3597 days ago
|
|
A unified L3 is expensive in a number of ways. It is large, which means it is geographically remote, as well as slow (for caches, big == slow). This costs lots of access latency. It also has a bandwidth problem. If 64 threads are vying for access, you either build it with few access ports and it gets choked, or you build it with many access ports which is costly in area, power, & speed. Two separate peer caches automatically have twice the bandwidth of one similar double-size cache, for the price of NUMA & cache coherency challenges. There is no one right answer here. Bandwidth is far more important and coherency much easier in a small L1; as you go down the hierarchy, bandwidth needs shrink and coherency is more expensive. |
|
Disclaimer: I don't know sh*t about hardware design as you can probably guess from my posting. ;o)