| HN Mirror

> That's only if your code happens to be particularly "64-bit-heavy", or the compiler isn't doing a good job at selecting registers

No, that's true for basically all code. 6 or 7 registers isn't enough for basically anything interesting, so you end up pretty much always hitting the high registers.

> Plus, what can be done with a single 4-byte instruction on x86 can require multiple 4-byte ARM instructions, and that adds up quickly.

The only real difference is that you have memory load addressing modes in x86, while for load-store architectures like AArch64 you don't. But:

* On x86-64 you have two-address instructions, not three-address instructions. This means that AArch64 "sub x9,x10,x11", or "49 01 0b cb", becomes x86-64 "mov r9,r10; sub r9,r11", or "4d 89 d1 4d 29 d9": 4 bytes vs. 6, thanks to the doubled REX prefix.

* On x86-64 immediates are very inefficiently encoded, while they tend to be compressed on RISCs to fit in the 32-bit instruction word. The end result is that AArch64 "sub x9,x10,#1234", or "49 49 13 d1" in 64-bit mode becomes x86-64 "lea r9,[r10-1234]", which is "4d 8d 8a 2e fb ff ff": 4 bytes vs. 7.

> I can't find it at the moment but one of the studies I remember comparing the binary sizes was using GCC, which is widely available and free, but probably one of the worst compilers at x86 size optimisation I've seen.

LLVM is doing pretty well at x86-64 size optimization: for example, it prefers to select lower registers to reduce size. As I recall, Dan Gohman told me the code size win was something on the order of 2%. It really doesn't make a big difference: AArch64 and x86-64 have about the same code size.

> you can write a Fibonacci calculator for the latter in 5 bytes

But real code, again, hits the high registers.

> pushes and pops are single-byte instructions

Pushes and pops aren't used by most compilers, except in function prologs and epilogs. This is actually an example of inefficiency in the design of x86-64. The opcode space shouldn't go to functions that are only used to set up and tear down functions.

> on the former even a register-register move is 4 bytes

"mov r11,r12" is 3 bytes on x86-64. Not a big difference…