| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by haberman 467 days ago

I think it's important to note that a primary motivation of the tail call interpreter design is to be less vulnerable to the whims of the optimizer. From my original blog article about this technique (https://blog.reverberate.org/2021/04/21/musttail-efficient-i...):

> Theoretically, this control flow graph paired with a profile should give the compiler all of the information it needs to generate the most optimal code [for a traditional switch()-based interpreter]. In practice, when a function is this big and connected, we often find ourselves fighting the compiler. It spills an important variable when we want it to keep it in a register. It hoists stack frame manipulation that we want to shrink wrap around a fallback function invocation. It merges identical code paths that we wanted to keep separate for branch prediction reasons. The experience can end up feeling like trying to play the piano while wearing mittens.

That second-to-last sentence is exactly what has happened here. The "buggy" compiler merged identical code paths, leading to worse performance.

The "fixed" compiler no longer does this, but the fix is basically just tweaking a heuristic inside the compiler. There's no actual guarantee that this compiler (or another compiler) will continue to have the heuristic tweaked in the way that benefits us the most.

The tail call interpreter, on the other hand, lets us express the desired machine code pattern in the interpreter itself. Between "musttail", "noinline", and "preserve_none" attributes, we can basically constrain the problem such that we are much less at the mercy of optimizer heuristics.

For this reason, I think the benefit of the tail call interpreter is more than just a 3-5% performance improvement. It's a reliable performance improvement that may be even greater than 3-5% on some compilers.

2 comments

kenjin4096 467 days ago

This is a good point. We already observed this in our LTO and PGO builds for the computed goto interpreter. On modern compilers, each LTO+PGO build has huge variance (1-2%) for the CPython interpreter. On macOS, we already saw a huge regression in performance because Xcode just decided to stop making LTO and PGO work properly on the interpreter. Presumably, the tail call interpreter would be immune to this.

link

sunshowers 467 days ago

In full agreement with this. There is tremendous value in having code whose performance is robust to various compiler configurations.

link