|
|
|
|
|
by CalChris
3298 days ago
|
|
I think a simpler argument is that for L1 you want fast, not big. Same thing with registers (a form of cache at a lower level). Why did MIPS only have 32 registers? Design Principle 2: Smaller is faster. [1] BTW, if you look at Agner Fog's latency tables [2], mov mem,r (load) went from 3 cycles in Haswell to 2 cycles in Skylake. So Intel has been concentrating on faster which is nice. And by way of comparison, AMD increased their μop cache size in Ryzen but then only slightly. Way size went from 6 μops to 8. This matches their increase in EUs. [1] Patterson and Hennessy. Computer Organization and Design, 5th edition, p. 67. [2] http://www.agner.org/optimize/instruction_tables.pdf |
|
Always impressed that Agner Fog takes the time to publish his results. Pretty amazing. But I think focusing your thinking on the register count in MIPs or the the uarch for some random opcode does not get into the real constraints on L1 cache design at all. One could say that x86 should be even faster, because hey, far less than 32 registers (or historically at least).
My response is like this: yes, the L1 has to be small to be fast, but it has been stuck at 32KB forever now. It could have grown! So it's not as simple as small is fast.