|
x86’s instruction decoder suffers from its inability to parallelize some things. Because instructions have no fixed boundary,[a] something has to process the bytes sequentially. Even if they can be read from memory in massive amounts, something still has to sit there going byte by byte to find the boundaries. The good news is, once those boundaries are found, uops can be generated. But that ~5% or so of die space is always running full tilt (provided there’s no pipeline stalls). I’m sure Intel and AMD have put a massive amount of work into theirs to make it as quick as possible,[b] but it’s still ultimately a sequential operation. With RISC-like architectures like ARM and RISC-V, you don’t need that boundary detector. Just feed the 2 or 4 bytes straight into the decoders. [a]: Unlike ARM and RISC-V which have fixed 2 or 4 byte encodings (depending on processor mode), x86’s instructions can be anywhere from 1 through 15 bytes. [b]: Take the EVEX prefix for example. It is always 4 bytes long with the first one being 0x62. So, once you see that 0x62 byte after the optional “legacy prefixes”, you can skip 3 bytes and go to the opcode. But then you need to decode that opcode to see if it has a ModR/M byte, decode that (partially) to see if there’s an SIB byte, decode that to see if there’s a displacement (of 1, 2, or 4 bytes), etc. And then, don’t forget about the immediate (which can be 1, 2, 4, or (in one case of MOV) 8 bytes). |