|
|
|
|
|
by Symmetry
3604 days ago
|
|
Well, address decoding can be started in parallel if your page size lets you do virtually indexed, physically tagged caches which applies to only some processors. But that's a separate issue from the relationship between cache size and cache speed. That's governed by three things. First, the larger your cache the more layers of muxing you need to select the data you need, meaning more FO4s of transistor delay. Second, the larger your cache the physically bigger it is. That means more physical distance between the memory location and where it is used. That means more speed of light delay. And third there's the issue of resolving contention for shared versus unshared caches. So despite the fact that you're using the same SRAM in both your L1 and L3 but access to the former takes 4 clock cycle but access to the later takes 80. |
|