Hacker News new | ask | show | jobs
by Tuna-Fish 2043 days ago
64-bit Arm is fixed width. Modern 32-bit Arm was not fixed width, as Thumb-2 was widely used.
1 comments

The main difference is x86 decode is hell to parallelize, as you have no idea where instructions start or end. It's a linear dependency chain of instruction lengths, an antipattern in the modern parallel processing world. Modern x86 CPUs have to use a large number of tricks and silicon to deal with this decently.

While even with Thumb-2, you can at worst just try decoding an instruction at every halfword. At worst you throw away half of the results if they are the second half of an instruction that was already taken care of. If you tried to do the same thing with x86 you'd throw away many more results, trying to decode (much more complex encodings) at every byte.

Is it really so hard to find instruction length in x86? State machines are associative, and therefore you can build a reduction tree for parallel processing of them. And the state machine itself isn't too bad: it's mostly prefixes, and figuring out if the opcode uses a ModR/M byte (which most do) or has an immediate operand. And while x86 does have a nasty habit of packing multiple instructions into a single opcode (via specific register values in the ModR/M byte), I believe all of them would share the same behavior in the immediate operand effects.

I suspect that in one pipeline stage, you could at least resolve the entire cacheline into the individual instruction boundaries that can be simultaneously issued into uops, if not having the entire instruction decoded into the hardware fields. You wouldn't know if register 7 referred to a general purpose register, or a debug register, or an xmm reg, or whatnot, but you'd probably know that it was a register 7.

And after you know each instruction boundary, now you have to do a massive mux from positions in the cache line to separate decoders. As I understand, that's a big part of the problem, and essentially costs more than a single pipeline stage.