| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by negate32 516 days ago

One of the big things which makes LLVM very slow is the abundance of passes, I believe I last counted 75 for an unoptimized function? My solution for this is writing more combined passes, due to the SoN design I'm combining a lot of things which are traditionally separate passes. For instance, my equivalent to "SimplifyCFG" "GVNPass", "InstCombine", "EarlyCSEPass" and "JumpThreadingPass" is one combined peephole solver which runs faster than all of these passes separately. This is for two main reasons:

* Less cache churn, I'm doing more work per cacheline loaded in (rather than rescanning the same function over and over again).

* Combining mutually beneficial optimizations can lead to less phase ordering problems and a better solve (this is why SCCP is better than DCE and constant prop separately).

In a few years when TB is mature, I'd wager I'll have maybe 10-20 real passes for the "-O2 competitive" optimizer pipeline because in practice there's no need to have so many passes.

2 comments

gergo_barany 515 days ago

If this is one of the main things you want to demonstrate, wouldn't it be better to focus on this one goal first, instead of the whole pipeline from a C preprocessor to directly linked executables?

Essentially, if you say that LLVM's mid-end in particular is slow, I would expect you to present a drop-in replacement for LLVM's mid-end opt tool. You could leave C-to-LLVM-bitcode to Clang. You could leave LLVM-bitcode-to-machine-code to llc. Just like opt, take unoptimized LLVM bitcode as input and produce optimized LLVM bitcode as output. You would get a much fairer apples to apples comparison of both code quality and mid-end compiler speed (your website already mentions that you aren't measuring apples-to-apples times), and you would duplicate much less work.

Alternatively, look into existing Sea of Nodes compilers and see if you can build your demonstrator into them. LibFIRM is such a C compiler: https://libfirm.github.io/ There may be others.

It just seems like you are mixing two things: On the one hand, you are making some very concrete technical statements that integrated optimizations are good and the Sea of Nodes is a great way to get there. A credible demonstrator for this would be very welcome and of great interest to the wider compiler community. On the other hand, you are doing a rite-of-passage project of writing a self-hosting C compiler. I don't mean this unkindly, but that part is less interesting for anyone besides yourself.

EDIT: I also wanted to mention that the approach I suggest is exactly how LLVM became well-known and popular. It was not because of Clang; Clang did not even exist for the first eight years or so of LLVM's existence. Instead, LLVM focused on what it wanted to demonstrate: a different approach to mid-end optimizations compared to the state of the art at the time. Parsing C code was not part of that, so LLVM left that to an external component (which happened to be GCC).

link

DannyBee 516 days ago

I'm very confused what magic you believe will achieve what has not so far been achieved.

I'm also confused why you believe LLVM didn't start out the exact same way?

I say this as one of the main people responsible for working on combined pass replacements in both GCC and LLVM for things that were reasonable to be combined.

I actually love destroying lots of passes in favor of better combined ones. In that sense, i'm the biggest fan in the world of these kinds of efforts.

But the reason these passes exist is not cruft, or 20 years of laziness, - it's because it's very hard to replace them with combined algorithms that are both faster, and achieve the same results.

What exactly do you plan on replacing GVN + Simplify CFG + Jump Threading + correlated value prop with?

It took years of cherrypicking research and significant algorithm development to develop algorithms for this that had reasonable timebounds, were faster, and could do better than all of them combined. The algorithm is quite complex, and it's hard to prove it terminates in all cases, actually. The number of people who understand it is pretty small because of the complexity. That's before you get to applied engineering of putting it in a production compiler.

These days, as the person originally responsible for it, i'd say it's not better enough for the complexity, even though it is definitely faster and more complete and would let you replace these passes.

Meanwhile, you seem to think you will mature everything and get there in a few years.

I could believe you will achieve some percent of GCC or LLVM's performance, but that's not the reason these passes exist. They exist because that is what it reasonably takes to achieve LLVM (and GCC's) performance across a wide variety of code, for some acceptable level of algorithm complexity and maintainability.

So if you told me you were only shooting for 80% across some particular subset code, i could believe 10-20 passes. If you told me you were going to build a different product that targets a different audience, or in a different way, i could maybe believe it.

But for what you say here, I think you vastly underestimate the difficult and vastly underappreciate the effort that goes into these things. This is hundreds of very smart people working on things for decades. It's one thing to have a healthy disrespect for the impossible. It's another to think you will, in a few years, outdo hundreds of smart, capable engineers on pure technical achievement.

That strikes me as somewhere between hubris and insanity.

People also pretty much stopped building and publishing general purpose compiler optimization algorithms a decade ago, moving towards much more specialized algorithms and ML focused things and whatnot.

This is because in large part, there isn't a lot left worth doing.

So unless you've got magic bullets nobody else has, either you won't achieve the same performance level, or it will be slow, or it will take you a significant amount of algorithm development and engineering well beyond "a few years".

I say this not to dissuade you, but to temper your apparent expectations and view.

Honestly - I wish you the best of luck, and hope you succeed at it.

link

rendaw 516 days ago

Critical responses from people in the industry are what I come here to read. I have no doubt of your credentials and I'm not at all qualified to judge the technical details here, but your reply comes off as an emotional kneejerk.

I read it a few times and as best I can get this is what you're saying:

- You came up with a similar combined replacement pass for LLVM based on years of personal and external research.

- It's faster and has more functionality.

- It's very complex and you're not comfortable that it's possible to achieve various reliability guarantees, so it is unreleased

- Therefore (?) you think the Tilde author also couldn't possibly succeed

AFAICT you also believe that the Tilde author hasn't completed their replacement pass. From their post my take was that it was already done. The part that will mature is additional passes, or maybe optimizations/bugfixes, but not the MVP development.

Your main arguments seem to be probability and appeal to authority (external research, assigned responsibility, industry association). Pretty much all projects and startups fail, but it's because people attempt them that some succeed.

Is the author betting their career on this? Why do their expectations need to be tempered?

I'd be interested in hearing concrete criticisms of their algorithms (have you looked at the code?) or oversights in the design. Maybe the author would too! If you let the author know, maybe you could think of a solution to reduce the complexity or improve the guarantees together.

link

DannyBee 516 days ago

" but your reply comes off as an emotional kneejerk."

Lots of these come by, it gets tiring to try to provide detailed critiques of all of them. Feel free to go through comment history and see the last one of these to come along and the detailed critiques there :)

Meanwhile, let me try to help more:

First, the stuff i'm talking about is released. It's been released for years. It is included in LLVM releases. None of this matter, it was an example of what it actually takes in terms of time and energy to perform some amount of pass combination for real, which the author pays amazingly short shrift to.

I chose the example I did not because i worked on it, because it's in the list of things the author thinks are possible to combine easily!

Second it's not one combined pass they have to make - they think they will turn 75 passes into 20, with equivalent power to LLVM, but somehow much faster, yet still maintainable, mainly because "it's time for a rewrite" and they will avoid 20 years of cruft.

So they don't have to repeat the example i gave once. They have to repeat it 20-30 times. Which they believe they will achieve and reach maturity of in ... a few years.

They give no particular reason this is possible - i explained why it is remarkably difficult - while certainly you can combine some dataflow optimizations in various ways, doing so is not just hacking around.

It's often hard computer science problems to take two optimization passes, combine them in some way, and prove the result even ever terminates, not even that it actually optimizes any better. Here they are literally talking about combining 3 or 4 at a time.

While there are some basic helpful tools we proved a long time ago about things like composability of monotonic dataflow problems, these will not help you that much here.

Solving these hard problems are what it takes to have it work. It's not just cherry picking research papers and implementing them or copying other compiler code or something.

Let's take a concrete example, as you request:

If you want to subsume the various global value numbering passes, which all eliminate slightly different redundancies and prove or otherwise guarantee that you have actually done so, you would need a global value numbering pass you can prove to be complete. Completeness here means that it detects all equivalent values that can be detected.

There is no way around this. Either it subsumes them or it doesn't. If it doesn't, you aren't matching what LLVM's passes do, which the author has stated as the goal. As I said, i could believe a lesser goal, but that's not what we have here.

The limit of value numbering completeness here was proved a long time ago. The best you can do is something called herbrand equivalences. Anything stronger than that can't be proven to be decidable, and to the degree it can, you can't prove it ever terminates.

That has the upside that you only have to achieve this to prove you've done the best you can.

It has the downside that there are very few algorithms that achieve this.

So there are a small number of algorithms that have been proven complete here (about 7) - and all but 3 have exponential worst time.

The three polymonial algorithms are remarkably complicated, and as far as i know, never been implemented in any production compiler, anywhere. Two are N^4, and one is N^3.

One of the N^3 ones has some followup papers where people question whether it really works or terminates in all cases.

These are your existing choices if you try to use existing algorithms to combine these 4 out of the 70 passes, into 1 pass.

Otherwise, you get to make your own, from scratch.

The author seems to believe you can still, somehow, do it, and make the result faster than the existing passes, which, because they do not individually try to be complete, are O(N) and in one case, N^2 in the worst case. So combined, they are N^2.

While it is certainly possible to end up with N^3 algorithms that are faster than N^2 algorithms in practice, here, none of the algorithms have also ever been proven practical or usable in a production compiler, and the fastest one has open questions about whether it works at all.

Given all this, i see the onus as squarely on the author to show this is really possible.

Again, this is just one example of what it takes to subsume 4 passes into 1, along the exact lines the author says they think they will do, and it would have to be repeated 30 more times to get down to 20 passes that are as good as LLVM.

That's without saying anything else about the result being faster, less complex, or having less cruft.

As for whether they've accomplished a combined pass of any kind -I've looked at the code in detail - it implements a fairly basic set of optimization passes that nowhere approaches the functionality of any of the existing LLVM passes in optimization power or capability. It's cool work for one person, for sure, but it's not really that interesting, and there are other attempts i would spend my time on before this one. I don't say that to knock the author - i mean it in the literal sense to answer your question - IE It is not interesting in the sense that there is nothing here that suggests the end result will achieve fundamental improvements over LLVM or GCC, as you would hope to see in a case like this. The choices made so far are a set of tradeoffs that have been chosen before in other compilers, and there is nothing (yet) that suggests it will not end up with similar results to those compilers.

It is any not further along, more well developed, etc, than other attempts have been in the past.

So when I look at it, i view that all as (at least so far) not interesting - nothing here yet suggests a chance of success at the goals given.

As I said, these things come along not infrequently - the ones that are most viable are the ones that have different goals (IE fast compilation even if it does not optimize as well. Or proven correct transforms. or ...). Or those folks who think they can do it, but it will take 10-15 years. Those are believable things.

The rest seem to believe there are magic bullets out there - that's cool - show me them.

As for " Pretty much all projects and startups fail, but it's because people attempt them that some succeed."

This is true, but also tautological, as you know - of course things can only succeed if someone attempts them. It is equally as true that just because people attempt things does not mean anyone will succeed. While it is true nobody will be able to breathe unassisted in space if nobody tries, that does not mean anyone can or will ever succeed at it no matter how many people try.

This case is not like like a startup that succeeds because it built a better product. This is like a startup that succeeds because it proved P=NP.

Those are not the same kind of thing at all, and so the common refrains about startups and such are not really that useful here.

The one you use is useful when arguing that if enough people try to build a better search and outdo Google (or whatever), eventually someone will succeed - this is likely true.

In this case, however, it is closer to arguing that if enough people jump off 500ft cliffs and die, eventually someone will achieve great success at jumping off 500ft cliffs.

link