|
|
|
|
|
by dmitrygr
815 days ago
|
|
This misses on an important bit: parallel decoding of instructions. It is a lot harder with variable-length instrs where the length cannot even be calculated from the first byte - you need to read 10 bytes in the worst case to find an instr's len in x86. In aarch64 you need to read 0 bytes to know the length - it is 4 This matters in the way it interacts with i-cache. In aarch64 with 64-byte cache lines, one cache line is 16 instrs. always. In x86 that cache line could contain only 3 whole instrs. So unless your core is able to ingest over one icache line per cycle (intel cores currently are NOT), you are thus limited. |
|
>Another oft-repeated truism is that x86 has a significant ‘decode tax’ handicap. ARM uses fixed length instructions, while x86’s instructions vary in length. Because you have to determine the length of one instruction before knowing where the next begins, decoding x86 instructions in parallel is more difficult. This is a disadvantage for x86, yet it doesn’t really matter for high performance CPUs because in Jim Keller’s words:
>For a while we thought variable-length instructions were really hard to decode. But we keep figuring out how to do that. … So fixed-length instructions seem really nice when you’re building little baby computers, but if you’re building a really big computer, to predict or to figure out where all the instructions are, it isn’t dominating the die. So it doesn’t matter that much.
>...
>Researchers agree too. In 2016, a study supported by the Helsinki Institute of Physics[2] looked at Intel’s Haswell microarchitecture. There, Hiriki et al. estimated that Haswell’s decoder consumed 3-10% of package power. The study concluded that “the x86-64 instruction set is not a major hindrance in producing an energy-efficient processor architecture.”