Hacker News new | ask | show | jobs
by carlopi 1560 days ago
Thanks a lot!

I also believe it would be cool to upstream this, we will have to sit down and do some planning.

Compilation time could improve (I was actually working on this today), but it's already in line with other optimizations, taking < 10% of the time spent doing optimizations on a big codebase we use as benchmark.

Currently it's applied to all functions, since runtime it's anyhow somehow linear in the number of Instructions a Function has, but possibly in more costly versions of this (that we have on paper but yet to implement) some logic to filter functions in advance could be used.

3 comments

When upstreaming it, it might make sense to give it two mechanisms: either process every function, or process only functions labeled to use it. Then, a frontend can experiment with things like automatically detecting which functions would benefit from it, using language-specific mechanisms. Or, worst-case, frontends can provide attributes to tag functions explicitly, and libraries can tag functions known to benefit from this.
My understanding is that CheerpX, but simulating a full userspace, can optimize all the way through system libraries, which are typically very general and have lots of code that is “dead” to your application. How would this approach fail for applications where the code tends to be more specific to the task?
Seeing through system libraries sounds problematic already, since some of them like libc violate aliasing when you can see their implementations, even in LLVM byte code form.
I have to double check this, but the approach should be theoretically sound since we are quite strict with what functions are considered to have known call-sites: Function has to be internal (as in no visibility from outside) + no indirect uses (so no saving a pointer to the function somewhere). This should be enough to solve the problem you are thinking about.
That's actually a good point–more aggressive optimizations that peek through libraries that are not typically visible to the compiler are generally going to break things like allocators. Although, I guess something must be compiling them anyways, so perhaps it might end up OK?
Consider integrating with PGO there, at least for getting most of the gains (assuming they are about performance, not binary size) with a fraction of the cost.