Hacker News new | ask | show | jobs
by mdpye 912 days ago
Because then it would need more registers for the other purpose?

But actually the decision about which general purpose registers to use for what is made at compile time (hence we're discussion a compiler flag here, the frame pointer is not a hardware dictated feature), so the question is actually kind of moot. If the compiler is out of registers to allocate and instead uses the stack, the CPU isn't reasonably going to be able to undo that.

1 comments

Sure, but wouldn’t it make sense to extend the instruction set to allow the compiler to use these registers instead of reserving them for speculative / out-of-order execution? It was just a thought i had after watching a talk by a compiler guy: https://youtu.be/2EWejmkKlxs?feature=shared&t=2409
jcranmer got it right. Read that reply (and mine). And then maybe rewatch watch Chandler Carruth says.

The current practice allows for CPUs to transparently increase their physical register count (to gain performance) and still run old code -- and older CPUs can still run new code. That's usually quite practical...

Adding more register names takes more bits for the register numbers -- which leads to larger instructions. It also leads to more complicated encodings if we want backwards compatibility. AMD64 does that by adding an optional prefix byte that carries a payload of 4 more instruction bits. That's one bit each for the three possible register names encoded in a traditional IA32 instruction + a bit to indicate whether to operate on 32-bit or 64-bit data (the actual rules are a bit more complex). Intel published a whitepaper recently suggesting a future encoding with a different (optional) prefix that encodes 8 more bits -- so each of the three register names can be extended to 5 bits (32 register names). It all ends up being quite complicated + new code won't run on older CPUs, which is not great.

I think you are suggesting not just bigger register names but also doing away with register renaming -- that would be... less than entirely useful because you would lose almost all your out-of-order capability and thereby almost all your ability to hide cache misses. Cache misses are very, very hard to predict statically (before actually running the code on a real CPU with real data) so good luck trying to do magic ahead-of-time allocation of those registers...