|
|
|
|
|
by SuperscalarMeme
1944 days ago
|
|
Performance is agnostic of ISA. Apple's custom designed cores do indeed have a massive performance/Watt advantage over x86 based designs and happen to be using ARM. However, it's not impossible for an x86 CPU to be designed in a similar way. It does, however, get more difficult to do so due to x86's variable length instruction encoding, to which ARM does not have. |
|
The good news is, once those boundaries are found, uops can be generated. But that ~5% or so of die space is always running full tilt (provided there’s no pipeline stalls).
I’m sure Intel and AMD have put a massive amount of work into theirs to make it as quick as possible,[b] but it’s still ultimately a sequential operation.
With RISC-like architectures like ARM and RISC-V, you don’t need that boundary detector. Just feed the 2 or 4 bytes straight into the decoders.
[a]: Unlike ARM and RISC-V which have fixed 2 or 4 byte encodings (depending on processor mode), x86’s instructions can be anywhere from 1 through 15 bytes.
[b]: Take the EVEX prefix for example. It is always 4 bytes long with the first one being 0x62. So, once you see that 0x62 byte after the optional “legacy prefixes”, you can skip 3 bytes and go to the opcode. But then you need to decode that opcode to see if it has a ModR/M byte, decode that (partially) to see if there’s an SIB byte, decode that to see if there’s a displacement (of 1, 2, or 4 bytes), etc. And then, don’t forget about the immediate (which can be 1, 2, 4, or (in one case of MOV) 8 bytes).