|
|
|
|
|
by mnw21cam
712 days ago
|
|
The article states that the CPU has a limit of 4 instructions per cycle, but the sum2 method issues 5 instructions per cycle. Presumably one of them (maybe the increment) is trivial enough to be executed as a fifth instruction. |
|
uiCA is a very nice tool which tries to simulate how instructions will get scheduled, e.g. this is the trace it produces for sum3 on Haswell, showing the fusion: https://uica.uops.info/tmp/75182318511042c98d4d74bc026db179_... .