| I think it's important to note that a primary motivation of the tail call interpreter design is to be less vulnerable to the whims of the optimizer. From my original blog article about this technique (https://blog.reverberate.org/2021/04/21/musttail-efficient-i...): > Theoretically, this control flow graph paired with a profile should give the compiler all of the information it needs to generate the most optimal code [for a traditional switch()-based interpreter]. In practice, when a function is this big and connected, we often find ourselves fighting the compiler. It spills an important variable when we want it to keep it in a register. It hoists stack frame manipulation that we want to shrink wrap around a fallback function invocation. It merges identical code paths that we wanted to keep separate for branch prediction reasons. The experience can end up feeling like trying to play the piano while wearing mittens. That second-to-last sentence is exactly what has happened here. The "buggy" compiler merged identical code paths, leading to worse performance. The "fixed" compiler no longer does this, but the fix is basically just tweaking a heuristic inside the compiler. There's no actual guarantee that this compiler (or another compiler) will continue to have the heuristic tweaked in the way that benefits us the most. The tail call interpreter, on the other hand, lets us express the desired machine code pattern in the interpreter itself. Between "musttail", "noinline", and "preserve_none" attributes, we can basically constrain the problem such that we are much less at the mercy of optimizer heuristics. For this reason, I think the benefit of the tail call interpreter is more than just a 3-5% performance improvement. It's a reliable performance improvement that may be even greater than 3-5% on some compilers. |