Hacker News new | ask | show | jobs
by clausecker 251 days ago
ARM64 has a trick up its sleeve: many instructions that would be longer on other architecturea are instead split into easily recognisable pairs on ARM64. This allows for simple inplementations to pretend it's fixed length while more complex ones can pretend it's variable length. SVE takes this one step further with MOVPRFX, which can add be placed before almost all SVE instructions to supply masking and a third operand.
1 comments

This trick is not getting ARM very far, as evidenced by its abysmal code density.
I'm talking about how they are able to integrate stuff that normally wouldn't fit into 32 bits (such as 3 operand simd with masking), not about getting the instruction set more compact. ARM knows how to do this (Thumb being the most compact mainstream ISA is evidence of that), they just have decided to waste a bit more space to make decoding simpler, while also adding more quality-of-life features.
Thumb2 was the most compact mainstream 32 bit ISA [1] when the competition was RV32IMAC, but recent RISC-V standard extensions -- such as those implemented in the Hazard3 core in RP2350 -- put RISC-V ahead.

I think long term failing to follow through with what they learned in Thumb2 in their 64 bit ISA will prove to be one of Ram's biggest (technical) mistakes. They thought they only competition they had to match was amd64.

[1] if you don't count Renesas RX as mainstream. It's a better-encoded variation on M68k/Coldfire with 1-8 byte instructions (and an actually good use of 1-byte for e.g. short conditional branches)

It is a trade-off they chose, and they have to pay for it by requiring a larger cache. I argue it wasn't a good choice.

They're not getting significantly simpler decoders, relative to RISC-V which chose the other route i.e. variable instruction length with parallelized decoder simplicity in mind.

They also do worse in other metrics e.g. longer interdependent instruction chains.

Still generally better than x86, but that's a really low bar to meet.

To be fair, it's a lot better than Power(PC), MIPS, SPARC, Alpha, PA-RISC, Itanium, Elbrus ...