Hacker News new | ask | show | jobs
by userbinator 2849 days ago
It's 99% of the cost of just having twice the registers, but not nearly as useful, and it would introduce a lot of potential performance pitfalls.

Of the times I've used Asm, there's been far more situations where an extra 32-bit register would be more useful than a 64-bit one, and having them combine automatically into high and low halves is more useful than you think. Tight loops with non-parallelisable bit/byte manipulations of this sort occur quite often in things like data compression and emulation.

Any instruction that might update only partially now has to wait for all the previous results on the register

Does it? Once again, you seem to be thinking in "partial registers" rather than "just another one" --- and I argue that this conceptual difference is very important. E.g. you can work with both AL and AH independently, then use them together as AX --- at which point, yes, the processor will need to wait for the results from both, but then it can combine them implicitly without having to waste time and space decoding and executing the instructions to do it.

3 comments

>Of the times I've used Asm, there's been far more situations where an extra 32-bit register would be more useful than a 64-bit one

You need a 64 bit register any time you want to store a pointer though unless you want to use some kind of a segmented memory model. I don't think anybody wants to go back to that although I'm not one to criticize weird fetishes.

Clearly when you look at the fine details of AMD64 it looks like a weird frankenstein monster of an instruction set. REX prefix holding the MSBs of registers since 32bit opcodes only allowed three bits to encode 8 registers. Same prefix used to set the width of memory target operands, except that some instructions default to 32bits while others default to 64. R12 has weird encoding quircks because it matches RSP in the "low" register set and that register has specific semantics when used as a base register...

I wonder how much die area is used on a modern CPU just to deal with all this cruft and translate it into a saner RISC instruction set internally.

Well the 16-bit and 8-bit registers work this way, and actual hardware shows this is far from free. We've had partial register stalls, merging uops, and other weird behavior (for example, ah today works very differently from al).

It's easy to say "think of them as separate registers" just as it's easy to say "think of them as a partial registers" - but the ISA definition is such that they have to appear as partial registers in the scenarios where it would be visible.

So sure, you could make hardware that would rename and treat them as different registers (at it has been done on some x86 versions), but then when you read a wider portion you need to combine them, which won't be free (yes it happens "implicitly" but that doesn't somehow make it much easier for the hardware).

There are also plenty of cases where you want zero-extension for functional reasons, especially in compiled code where things like casts to a larger size become free. Cleverly using both halves of a register and using the implicit combination into a full 64-bit value is much rarer that just wanting to store a 32-bit value and sometimes wanting to use it as a zero-extended 64-bit value.

>"We've had partial register stalls, merging uops, and other weird behavior (for example, ah today works very differently from al)."

Can you elaborate on the last behavior, that " ah today works very differently from al"? How do they differ? I hadn't heard this before.

You can find all the gory details at:

https://stackoverflow.com/a/45660140/149138

I think ARM64 has instructions for inserting ranges of bits from one register into another.