|
|
|
|
|
by john-h-k
381 days ago
|
|
This isn't just due to the actual dependencies of flag instructions at hardware level (although likely be a factor), it also majorly affects code layout. On Arm64 for example, you can make a comparison, do other operations, and then consume the result of that comparison afterwards, which is excellent for the pipeline and OoO engine.
However, because most instructions on x86_64 write flags, you can't do this, and so you are forced to cram `jcc`/`setcc` instructions right after the comparison, which is less friendly to compilers and the OoO engine |
|
And with compare & jump being adjacent they can be fused together into one uop, which Intel, AMD, and Apple Silicon all do.