|
|
|
|
|
by ajross
1337 days ago
|
|
> In the end, the 8th instruction has 99 possible byte-offset, and assuming that we put, as you suggest, a decoder for each position and length, we need about 1590 decoders and many multiplexer to decode 8 full instructions per cycle. Um... wat? No CPU tries to decode 99 bytes of memory in a cycle. ADL is at 32 currently, I believe. And the instruction starting at byte 12 doesn't change depending on anything but it's own data. It either exists (because the previous instruction ended on byte 11) or it doesn't. So you decode 32 instructions starting at each byte you've fetched (the last ones can be smaller subset engines because they don't need to decode longer instruction forms), and then mask them on or off based on earlier instruction state. Then feed your 1-32 decoded instructions through a mux tree to pack them and you're done. Surely there's more complexity, since this is going to have to be pipelined in practice, and a depth of 32 is going to require something akin to a carry-lookahead adder instead of being chained. But the combinatorics you're citing seem ridiculous, I don't understand that at all. |
|
Actually, no x86 processor decodes 8 instructions in parallel. This is an example to illustrate how the number of possible offsets scales with 15 instruction lengths.
> So you decode 32 instructions starting at each byte you've fetched
No you don't do that, it's too power consuming.
> But the combinatorics you're citing seem ridiculous, I don't understand that at all.
What I'm trying to explain is that decoding 8 instructions in parallel in x86 is hardly possible, while decoding 8 instructions (or more) from a RISC archi per cycle is never a problem