|
|
|
|
|
by avianes
1326 days ago
|
|
> Um... wat? No CPU tries to decode 99 bytes of memory in a cycle Actually, no x86 processor decodes 8 instructions in parallel. This is an example to illustrate how the number of possible offsets scales with 15 instruction lengths. > So you decode 32 instructions starting at each byte you've fetched No you don't do that, it's too power consuming. > But the combinatorics you're citing seem ridiculous, I don't understand that at all. What I'm trying to explain is that decoding 8 instructions in parallel in x86 is hardly possible, while decoding 8 instructions (or more) from a RISC archi per cycle is never a problem |
|
Uh... yes you do? How else do you think it works? I'm not saying there's no opportunity for optimization (e.g. you only do this for main memory fetches and not uOp execution, pipeline it such that the full decode only happens a stage after length decisions, etc...), I'm saying that it isn't remotely an intractable power problem. Just draw it out: check the gates required for a 64->128 Dadda multiplier or 256 bit SIMD operation and compare with what you'd need here. It's noise.
And your citation of "8 instructions in parallel" seems suspicious. Did I just get trolled into a Apple vs. x86 flame war?