| HN Mirror

SUB does not have higher latency than XOR on any Intel CPU, when those operations are really performed, e.g. when their operands are distinct registers.

The weird values among those listed by you, i.e. those where the latency is less than 1 clock cycle, are when the operations have not been executed.

There are various special cases that are detected and such operations are not executed in an ALU. For instance, when the operands of XOR/SUB are the same the operation is not done and a null result is produced. On certain CPUs, the cases when one operand is a small constant are also detected and that operation is done by special circuits at the register renamer stage, so such operations do not reach the schedulers for the execution units.

To understand the meaning of the values, we must see the actual loop that has been used for measuring the latency.

In reality, the latency measured between truly dependent instructions cannot be less than 1 clock cycle. If a latency-measuring loop provides a time that when divided by the number of instructions is less than 1, that is because some of those instructions have been skipped. So that XOR-latency measuring loop must have included XORs between identical operands, which were bypassed.