Yes and no. Look at some of the loop optimisations possible on ARM compared to x86-64. I've had x86-64 run 8 instructions that ARM does in 1 instruction.
I remember PPC and its rlwinms and co. My ARM isn’t that good, though I can read it.
But some of those x86 instructions take 0.5 cycles and some of them take 0 if they’re removed by fusion or register renaming. It has worse problems, like loop instructions you can’t actually use but take up the shortest codes.
But some of those x86 instructions take 0.5 cycles and some of them take 0 if they’re removed by fusion or register renaming. It has worse problems, like loop instructions you can’t actually use but take up the shortest codes.