|
|
|
|
|
by ajross
1340 days ago
|
|
> No you don't do that, it's too power consuming. Uh... yes you do? How else do you think it works? I'm not saying there's no opportunity for optimization (e.g. you only do this for main memory fetches and not uOp execution, pipeline it such that the full decode only happens a stage after length decisions, etc...), I'm saying that it isn't remotely an intractable power problem. Just draw it out: check the gates required for a 64->128 Dadda multiplier or 256 bit SIMD operation and compare with what you'd need here. It's noise. And your citation of "8 instructions in parallel" seems suspicious. Did I just get trolled into a Apple vs. x86 flame war? |
|
No, I literally explain it in my first answer. The part about "1590 decoders" is irrelevant since a misunderstood your message (thinking that you are talking about using 16 decoders to decode the 16 instruction lengths of a single instruction).
But the rest on instruction length decode is how you actually do it.
> I'm saying that it isn't remotely an intractable power problem.
I mean, obviously, if you ignore all the power consumption issues of using 32 decoders in parallel and using only 5 of the results out of the 32. Then yes, there's no problem.
But in reality, yes it's a problem to decode many x86 instructions in parallel.
> Just draw it out: check the gates required for a 64->128 Dadda multiplier or 256 bit SIMD operation and compare with what you'd need here. It's noise.
Yes, the energy consumption of the multipliers is high, but I don't see how this is an argument to make an inefficient decoder? Also, a multiplier power consumption depends on transistor activity, and you can expect the MSB of the operand not to change too much. For decoder the transistor activity will be high.
> And your citation of "8 instructions in parallel" seems suspicious. Did I just get trolled into a Apple vs. x86 flame war?
Not a troll nor a flame war. I don't use Apple products, mainly because I don't agree with Apple practices. But actually choosing a RISC ISA allows them to decode a lot of instructions in parallel for little energy and complexity.
I chose 8 because it is the maximum that the mainstream will currently see. You might argue that 8 RISC instructions are not comparable with 8 CISC instructions, but even with say 4 CISC instructions it will still consume more energy