|
|
|
|
|
by Tuna-Fish
2851 days ago
|
|
> I've heard the "partial register stall" excuse multiple times, ostensibly valid but only if you insist on thinking in "partial registers" instead of simply more 32-bit ones as input. For example, some variants of the divide instruction use EDX:EAX (or RDX:RAX) for its input. That would mean you have to double the amount of state you track. The hardware cost of doing this is ~= the cost of doubling the amount of 64-bit registers. The amount of transistors used for storing register data is negligible compared to the cost of "metadata" and handling around them. Why not just have more registers then? "Just allow us to partially update the upper halves of the registers" is the sort of thing someone who understands software but not hardware would ask. It's 99% of the cost of just having twice the registers, but not nearly as useful, and it would introduce a lot of potential performance pitfalls. (Any instruction that might update only partially now has to wait for all the previous results on the register.) |
|
Of the times I've used Asm, there's been far more situations where an extra 32-bit register would be more useful than a 64-bit one, and having them combine automatically into high and low halves is more useful than you think. Tight loops with non-parallelisable bit/byte manipulations of this sort occur quite often in things like data compression and emulation.
Any instruction that might update only partially now has to wait for all the previous results on the register
Does it? Once again, you seem to be thinking in "partial registers" rather than "just another one" --- and I argue that this conceptual difference is very important. E.g. you can work with both AL and AH independently, then use them together as AX --- at which point, yes, the processor will need to wait for the results from both, but then it can combine them implicitly without having to waste time and space decoding and executing the instructions to do it.