|
|
|
|
|
by trepetti
2049 days ago
|
|
One of the interesting comments in that article was how they pinned limited width of x86 decode implementations on variable instruction length. There are obvious code density benefits to the x86 variable length approach (especially in immediate encoding), but I guess the need for realignment creates long critical paths on the frontend. I wonder if the more constrained variable instruction length of RV64GC (32- and 16-bit instructions only) will be able to similarly scale up to 8-wide instruction decode like Apple has been able to do with AArch64. |
|
Much of the opcode space is wasted with old single byte instructions that are rarely used these days. There are REX prefix bytes required everywhere to access all 16 registers. Modern instructions that are used all the time are hidden away behind prefix bytes.
64bit ARM is a complete redesign of the ARM instruction encoding, and they put a lot of thought into using the instruction space optimally. 32bit ARM also wasted a lot of it's instruction encoding space, but 64bit ARM is a massive improvement. Despite requiring roughly 10% extra instructions on average, the average arm64 is around the same size as the average x86 binary.
Immediates aren't a huge problem. All 32bit immediates and many 64bit immediates can be encoded in 1-2 instructions. Anything that can't should probably just use a PC relative load.
IMO, the much simpler instruction decoding massively outweighs the need for slightly more load bandwidth.