Hacker News new | ask | show | jobs
by JonChesterfield 128 days ago
The game is deeper than that. Your model is probably about right for the compiler you're using. It shouldn't be - compilers can do better - but it's all a work in progress.

Small scale stuff is you don't usually spill around every call site. One of the calls is the special "return" branch, the other N can probably share some of the register shuffling overhead if you're careful with allocation.

Bigger is that the calling convention is not a constant. Leaf functions can get special cased, but so can non-leaf. Change the pattern of argument to fixed register / stack, change which registers are callee/caller saved. The entry point for calls from outside the current module needs to match the platform ABI you claimed it'll follow but nothing else does.

The inlining theme hints at this. Basic blocks _are_ functions that are likely to have a short list of known call sites, each of which can have the calling convention chosen by the backend, which is what the live in/out of blocks is about. It's not inlining that makes any difference to regalloc, it's being more willing to change the calling convention on each function once you've named it "basic block".

1 comments

Why is almost no one in this comment thread is willing to face the scenario where the function call has to actually happen, and be an actual function call? The reactions are either "no-no-no-no, the call will be inlined, don't you worry your pretty head" or "well, then the compiler will just use less registers to make less spills" — which precisely agrees with my point that having more registers ain't necessarily all that useful.

> Small scale stuff is you don't usually spill around every call site.

Well duh: it's small, so even just 8 registers is likely enough for it. So again, why bother with cumbersome schemes to extend to 32 registers?

And this problem actually exists, that's why SPARC tried register windows and even crazier schemes on the software side of things had been proposed e.g. [0] — seriously, read this. And it's 30 years old, and IIUC nothing much came out of it so excuse me if I'm somewhat skeptical about "compilers can do better - but it's all a work in progress" claims. Perhaps they already do as best they can for general-purpose CPUs. Good thing we have other kinds processing units readily available nowadays.

[0] David W. Wall, "Global Register Allocation at Link Time", 1986, https://dl.acm.org/doi/10.1145/12276.13338