|
Taking the example: cmpb $115, %cl
sete %dl
addl %edx, %eax
vs cmpb $115, %cl
jne _run_switches_jmptgt1
mov $1, %dl
_run_switches_jmptgt1:
addl %edx, %eax
The argument about why `jne` might be faster is that that in the former case, the CPU always executes a dependency chain of length 3: `cmpb` -> `sete` -> `addl`. Each of these instructions have to be computed one after the other, as `sete` depends on the result of `cmpb`, and `addl` depends on the result of `sete`.With `jne`, the CPU might predict the branch is not taken, in which case, the dependency chain is
`mov` -> `addl` (the `mov` of an immediate might be handled by register renaming?). Or that it is taken, in which case in which case the dependency chain is just `addl`. I guess you're arguing that the CPU should handle `sete` the same way?
That is, instead of treating `addl` as dependent on the result, predict what `sete` does and start executing `addl` before `sete` finishes, rewinding if that went wrong? |
Microcode can set the EIP register based on its prediction of what the result of cmpb $115, %cl will be.
Why can't it set the EDX register based on its prediction of what the result of cmpb $115, %cl will be?