| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dexen 1687 days ago

Beyond reasons & limitations explained in sibling posts, having large number of (logical / instruction-level) registers also inflates instruction size, and thus diminishes instruction density, and thus lowers performance - so there is a trade-off between that and large number of registers. Hear me out.

The CPU has limited memory bandwidth; the larger instruction size, the more bytes needs to be loaded from memory to execute the instruction. Same with cache size - the more space an instruction takes, the lower the amount of instructions that is cached. Lastly, there's the complexity & latency of the instruction decoder. This possible performance loss is averted by keeping instructions short and instruction set "dense".

Any instruction that refers to a register needs certain amount of bits in the operand portion to indicate which specific register(s) is to be used [1][2][3]. As example, in case of 8-register x86 the operand generally uses 3 bits just to indicate which register to use. In case of 16 register x86_64, it takes 4 bits. If we wanted to use all 200 physical register, that would require whole 8 bits reserved in the instruction just to indicate the register to use. Certain instructions - data transfer, algebra & bitwise operations, comparisons, etc. - naturally use two or more registers, so multiply that accordingly.

Since using this many registers gives only diminishing return in terms of performance (and also requires very heavy lifting on compiler's part[4]), the trade-off selected is that the compiler uses architecture-defined small number of registers, and the processor at runtime is able to speed up some code using the spare registers for instruction-level execution parallelism.

[Edit]

There's one more common circumstance where large number of registers is undesirable: a change of execution context (thread switch; process switch; interrupt). Typically all architecturally-visible registers are saved to memory on a change of context and new set is loaded for the new context. The more registers there are, the more work is to be done. Since the hardware registers are managed directly by CPU and serve as more of cache than directly accessed register, they don't need to be stored to memory.

[1] Aside of certain specialized instructions that implicitly use a particular register; for example in x86 many instructions implicitly use the FLAGS register; DIV/IDIV integer division implicitly uses AX and DX registers.

[2] Aside of certain instruction prefixes that influence which register is used; for example in x86 that would be segment register overrides.

[3] Aside of certain architectures where registers were organized in a "file" and available only through a "window" - i.e., an implicit context, implicit register addressing base; instruction operands referred to registers relative to the current window, and the whole window could be shifted by specialized instructions. Typically shifted on function enter/leave and similar. This was more-or-less the whole "hardware registers" being exposed at architecture level, however in a somewhat constrained / instruction-dense way.

[4] Arranging which registers to use, which to spill to memory etc. is non-trivial work for compiler, and the complexity grows super-linearly with the number of registers.