| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by colejohnson66 1945 days ago

x86’s instruction decoder suffers from its inability to parallelize some things. Because instructions have no fixed boundary,[a] something has to process the bytes sequentially. Even if they can be read from memory in massive amounts, something still has to sit there going byte by byte to find the boundaries.

The good news is, once those boundaries are found, uops can be generated. But that ~5% or so of die space is always running full tilt (provided there’s no pipeline stalls).

I’m sure Intel and AMD have put a massive amount of work into theirs to make it as quick as possible,[b] but it’s still ultimately a sequential operation.

With RISC-like architectures like ARM and RISC-V, you don’t need that boundary detector. Just feed the 2 or 4 bytes straight into the decoders.

[a]: Unlike ARM and RISC-V which have fixed 2 or 4 byte encodings (depending on processor mode), x86’s instructions can be anywhere from 1 through 15 bytes.

[b]: Take the EVEX prefix for example. It is always 4 bytes long with the first one being 0x62. So, once you see that 0x62 byte after the optional “legacy prefixes”, you can skip 3 bytes and go to the opcode. But then you need to decode that opcode to see if it has a ModR/M byte, decode that (partially) to see if there’s an SIB byte, decode that to see if there’s a displacement (of 1, 2, or 4 bytes), etc. And then, don’t forget about the immediate (which can be 1, 2, 4, or (in one case of MOV) 8 bytes).

1 comments

teucris 1945 days ago

Something has been bugging me about x86’s lack of boundaries...could the boundaries be computed ahead-of-time and passed to the processor?

link

colejohnson66 1945 days ago

Not that I’m aware of. The decoding of an instruction is complicated and also dependent on the current operating mode and a few other things. So, for an OS to pass those lengths before hand, it’d have to know everything about the current state of the processor at that instruction. For example, in 16 and 32 bit modes, opcodes 0x40 through 0x4F are single byte INC and DEC (one for each register). In 64 bit mode, those are the single byte REX prefixes; The actual opcode follows. See also: the halting problem.

As for why it became an issue, instruction sets need to be designed from the beginning to be forward expandable. Intel has historically not done that with x86. Take AVX for example. Originally, it was just 128 bit (XMM) vectors encoded as an opcode with various prefix bytes being used in ways they weren’t intended. Later, 256 bit vectors were needed. So they made the VEX prefix. But it only had 1 bit for vector length. This allowed 128 bit (XMM) and 256 bit (YMM) vectors, but nothing else. So when AVX-512 came along, Intel had to ditch it and create the EVEX prefix and allow both to be used. But EVEX only has 2 bits for vector length. So, should something past AVX-512 come out (AVX-768 or AVX-1024?), it’ll probably use the reserved bit pattern 11, and they’ll be stuck again if they want to go past that.

For an example of this being done right, ForwardCom[0] (started by the great Agner Fog) took the “forward compatibility” (hence the name) issue into mind and used 2 bits to signal the instruction length. It’ll probably never reach silicon, but it and RISC-V (which is in silicon form) are good examples of attempting to keep things forward compatible.

[0]: https://forwardcom.info/

link

skissane 1945 days ago

> Not that I’m aware of. The decoding of an instruction is complicated and also dependent on the current operating mode and a few other things. So, for an OS to pass those lengths before hand, it’d have to know everything about the current state of the processor at that instruction

The compiler would know the instruction boundaries. It could store that information in a read-only section in the executable. The OS would then just pass that section to the CPU somehow.

I don't think there is anything impossible about this. Would there be sufficient performance benefit to justify the added complexity? I don't know, quite possibly not.

link

Cloudef 1945 days ago

This sounds like a potential attack vector.

link

skissane 1945 days ago

I'm not sure why it would be. If the boundary information were wrong, the CPU instruction decode would fail, but that should just be an invalid instruction exception, which operating systems already know how to handle.

link