| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by batterseapower 745 days ago

My PhD was on supercompilation for Haskell (thanks for the cite in the repo :-))

Cool to see that people are still working on it, but I think that the main barrier to practical use of these techniques still remains unsolved. The problem is that it's just so easy for the supercomplier to go into some crazy exponential blowup of function unfolding, making the compilation step take impractically long.

Even if you avoid a literal exponential blowup you can easily end up generating tons of specializations that bloat your code cache but don't reveal any useful optimization opportunities/are infrequently used in practice. Similar performance problems also plague related techniques like trace-based JITs, even though the trace JIT happens at runtime and thus has access to strictly more information about the frequency with which a trace might be used.

You can try to use annotations like the @extract proposed in the article to control these problems, but it can be hard to predict in advance when this is going to occur.

One interesting research direction might be to use deep reinforcement learning to try to guide the generalization/termination decisions, where the reward is based on A) whether the unfolding leads to a tie-back later on, and B) to what extent the unfolding allowed deforestation/beta reduction to take place.

2 comments

Hirrolot 744 days ago

Thanks for commenting! Yes, automatically predicting whether it's beneficial to specialize a function is not implemented. It's under the section "Further research" to find suitable heuristics to automatically emit annotations like those `@extract` that we already have.

This topic was brought up many times in the past. Most of the time it was suggested to use speculative approaches to detect blowups, but I haven't actually seen any attempts to predict them before supercompilation. For example, while I was writing Mazeppa, I've seen many times that blowups occur because some function call gives insufficient information that is subsequently pattern-matched by another function, and since it cannot be properly pattern-matched at compile-time, a lot of code duplication takes place.

I'm leaning towards a kind of "abstract interpretation pass" before supercompilation to approximate "good" and "bad" interactions between functions. After all, given that offline analyses exist even for harder problems such as detecting non-termination, why cannot we understand how supercompilation gives us desirable results?

link

c0balt 745 days ago

Maybe this is naive but couldn't you use the same approach with profiles collected at runtime as PGO like LLVM/GCC? In my experience it can help quite a lot with guiding inlining both by hand and via a compiler.

link

batterseapower 745 days ago

I don't think that's naive - it could definitely help ameliorate the issues. To my knowledge noone has tried this approach.

The other idea I had when I was working on this stuff was to do JIT supercompliation. Clasically, supercompilers expand the whole graph at compile time, but it would be straightforward to delay the supercompilation process until the first time that the code is actually executed, which helps tame the problem of generating lots of specializations that may never actually be executed in practice.

(Choosing a generalization heuristic becomes more important in the JIT setting because you always have access to the full info about all parameters to the function you are specializing, and naively you would end up e.g. specializing sigmoid(x)=1/(1+exp(x)) for each value of x observed at runtime.)

link

rowanG077 744 days ago

Do current JIT implementation already have something to control specialization? I have heard many times for example that JavaScript engines will specialize hot paths.

link

moomin 744 days ago

To the best of my knowledge (which isn’t great) .NET has multiple compilation levels for functions, using more time after a method has been called a few times. But starts with a pretty fast but naive compilation. It also, I believe, specialises everything all the time, but (probably because of the multi-level compilation) that doesn’t seem to be a practical problem.

link