| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Tuna-Fish 2851 days ago

> I've heard the "partial register stall" excuse multiple times, ostensibly valid but only if you insist on thinking in "partial registers" instead of simply more 32-bit ones as input. For example, some variants of the divide instruction use EDX:EAX (or RDX:RAX) for its input.

That would mean you have to double the amount of state you track. The hardware cost of doing this is ~= the cost of doubling the amount of 64-bit registers. The amount of transistors used for storing register data is negligible compared to the cost of "metadata" and handling around them. Why not just have more registers then?

"Just allow us to partially update the upper halves of the registers" is the sort of thing someone who understands software but not hardware would ask. It's 99% of the cost of just having twice the registers, but not nearly as useful, and it would introduce a lot of potential performance pitfalls. (Any instruction that might update only partially now has to wait for all the previous results on the register.)

2 comments

userbinator 2851 days ago

It's 99% of the cost of just having twice the registers, but not nearly as useful, and it would introduce a lot of potential performance pitfalls.

Of the times I've used Asm, there's been far more situations where an extra 32-bit register would be more useful than a 64-bit one, and having them combine automatically into high and low halves is more useful than you think. Tight loops with non-parallelisable bit/byte manipulations of this sort occur quite often in things like data compression and emulation.

Any instruction that might update only partially now has to wait for all the previous results on the register

Does it? Once again, you seem to be thinking in "partial registers" rather than "just another one" --- and I argue that this conceptual difference is very important. E.g. you can work with both AL and AH independently, then use them together as AX --- at which point, yes, the processor will need to wait for the results from both, but then it can combine them implicitly without having to waste time and space decoding and executing the instructions to do it.

link

simias 2851 days ago

>Of the times I've used Asm, there's been far more situations where an extra 32-bit register would be more useful than a 64-bit one

You need a 64 bit register any time you want to store a pointer though unless you want to use some kind of a segmented memory model. I don't think anybody wants to go back to that although I'm not one to criticize weird fetishes.

Clearly when you look at the fine details of AMD64 it looks like a weird frankenstein monster of an instruction set. REX prefix holding the MSBs of registers since 32bit opcodes only allowed three bits to encode 8 registers. Same prefix used to set the width of memory target operands, except that some instructions default to 32bits while others default to 64. R12 has weird encoding quircks because it matches RSP in the "low" register set and that register has specific semantics when used as a base register...

I wonder how much die area is used on a modern CPU just to deal with all this cruft and translate it into a saner RISC instruction set internally.

link

BeeOnRope 2851 days ago

Well the 16-bit and 8-bit registers work this way, and actual hardware shows this is far from free. We've had partial register stalls, merging uops, and other weird behavior (for example, ah today works very differently from al).

It's easy to say "think of them as separate registers" just as it's easy to say "think of them as a partial registers" - but the ISA definition is such that they have to appear as partial registers in the scenarios where it would be visible.

So sure, you could make hardware that would rename and treat them as different registers (at it has been done on some x86 versions), but then when you read a wider portion you need to combine them, which won't be free (yes it happens "implicitly" but that doesn't somehow make it much easier for the hardware).

There are also plenty of cases where you want zero-extension for functional reasons, especially in compiled code where things like casts to a larger size become free. Cleverly using both halves of a register and using the implicit combination into a full 64-bit value is much rarer that just wanting to store a 32-bit value and sometimes wanting to use it as a zero-extended 64-bit value.

link

bogomipz 2850 days ago

>"We've had partial register stalls, merging uops, and other weird behavior (for example, ah today works very differently from al)."

Can you elaborate on the last behavior, that " ah today works very differently from al"? How do they differ? I hadn't heard this before.

link

BeeOnRope 2850 days ago

You can find all the gory details at:

https://stackoverflow.com/a/45660140/149138

link

ant6n 2851 days ago

I think ARM64 has instructions for inserting ranges of bits from one register into another.

link

haberman 2851 days ago

Interesting. It seems then that we can understand the difference between the 16->32 design of IA-32 (1985) and the 32->64 design of x86-64 (2003) as being heavily influenced by the fact that out-of-order execution had become an important consideration in 2003?

link