Hacker News new | ask | show | jobs
by m0th87 3121 days ago
> For example, register allocation is a tedious process that bedeviled assembly programmers, whereas higher-level languages like C++ handle it automatically and get it right every single time.

Ideal register allocation is NP-complete, so a compiler can't get it right every single time.

I'm not sure how good in practice modern compilers are at this, but would be curious to know if there's some asm writers who can actually consistently outperform them.

4 comments

They're not saying that C++ compilers do the best possible register allocation, they're saying that C++ compilers generate a register allocation that works and doesn't contain bugs. Technically, spilling everything to memory and loading only what you need to issue instructions is "getting it right" by this definition. No compiler strives to get the "optimal" anything in the general case, but we do expect them to strive to be "correct" in all cases. The language we use determines which properties are included in our idea of "correctness".
I think "get it right" here is "have it work at all", not "get it fast".
Ah, that makes sense!
I found this example interesting. I found myself comparing it to how Rust does memory management, which is certainly not "automatic" as that would describe a garbage collected language.
Optimal register allocation has been polynomial time for more than 10 years - for some definition of optimal. IIRC it started with programs in SSA form and has dropped that requirement more recently. Modern GCC uses SSA form and I think LLVM might too.
GCC and LLVM do not retain SSA form by the time register allocation happens (they both convert to a non-SSA low-level IR before then).

It's also worth pointing out that "optimal" in theory doesn't necessarily correspond to optimal in practice. The hard problem of register allocation isn't coloring the interference graph (since there's not enough registers most of the time), it's deciding how best to spill (or split live ranges, or insert copies, or rematerialize, or ...) the excess registers. Plus, real-world architectures also have issues like requiring specific physical registers for certain instructions and subregister aliasing which are hard to model.

In practice, the most important criterion tends to be to avoid spilling inside loops. This means that rather simple heuristics are generally sufficient to optimally achieve that criterion, and in those cases, excessive spilling outside the loops isn't really going to show up in performance numbers. Thus heuristics are close enough to optimal that it's not worth the compiler time or maintenance to achieve optimality.

Yes, the fun in compilers: Even if every phase and every optimization actually produces optimal results, the combination is probably not optimal.

One deep problem is that there is no good optimization goal. Today's CPUs are too complex and unpredictable performance-wise.

Another problem is: Register pressure is one of the most important things to minimize, but how can the phases before register allocation do that? They use a surrogate, like virtual registers, and thus become heuristics even if they solve their theoretical problem optimally.

> IIRC it started with programs in SSA form and has dropped that requirement more recently.

No, it's only SSA form that has optimal register allocation in polynomial time, otherwise someone would've proved that P=NP (as its proven to NP-Hard). :)

That said, finding minimal SSA from arbitrary SSA is NP-Hard.

> Modern GCC uses SSA form and I think LLVM might too.

LLVM has always used SSA (this is relatively unsurprising given its origins in research and so much research of the period being on SSA).