|
|
|
|
|
by etep
3302 days ago
|
|
At a high level it's true that smaller is faster, but it's also true that those L1s could have grown by adding sets (not ways) and achieved the same latency. L2 has grown, but stayed iso-latency. This seems to say that "smaller is faster" does not always hold. Always impressed that Agner Fog takes the time to publish his results. Pretty amazing. But I think focusing your thinking on the register count in MIPs or the the uarch for some random opcode does not get into the real constraints on L1 cache design at all. One could say that x86 should be even faster, because hey, far less than 32 registers (or historically at least). My response is like this: yes, the L1 has to be small to be fast, but it has been stuck at 32KB forever now. It could have grown! So it's not as simple as small is fast. |
|
I also vaguely remember the Mill cpu guy talking about cache size constraints just because of the speed of light, but given node size has continued to decrease during the last decade while frequency has nearly stopped to increase, this might be less an issue than basic area optimizations. Or this might be an interesting consideration on Mill only because it is a radically different architecture, and needs different area ratios.
Only wild guesses though, I don't even have tried to confirm any of that with any kind of research or back of the envelop calculations.