Hacker News new | ask | show | jobs
by moonchild 1073 days ago
> are computers just not yet fast enough to do a good job with very simple loops in practical compilers?

The short answer to this question is 'yes', but there are some extenuating factors:

- Although we could do interesting things with unlimited computational resources, the current crop of c compilers is simply not very good, compared with what's possible today.

- Performance is always workload-dependent; the compiler has been somewhat shafted here because it doesn't know what sorts of inputs the function usually receives. The compiler output is better than the 'improved' code for some inputs. (It's possible you could get a better result from the existing compilers and c code just by using profile-guided optimisation.)

- The difference is prone to be more pronounced in simple loops than large ones. This is a contrived use-case. There is not a factor of 6 of performance hiding in optimised c code which could be recovered by doing the sorts of optimisations done by the op. Probably something more like 10-20%.

1 comments

> the current crop of c compilers is simply not very good, compared with what's possible today.

That's quite dismissive. What exactly "is possible today" and why aren't these top compilers using them?

One prominent example: the use of intermediate representations based on basic blocks introduces redundancies that increase the complexity of the compiler, requiring attendant redundancies in order to optimise the same. You can see the redundancy manifest here https://godbolt.org/z/8o3oe39hh as different code generation from f and g. (They may change the result of this particular test in the future, but it seems unlikely that the disease—rather than the symptom—can be treated without a complete rearchitecture.)

E-graphs ameliorate phase ordering issues and allow for exploring the space of non-monotonic rewrites; recent research makes them computationally viable.

Put simply: it's legacy. Gcc and llvm are millions of lines of code, and they assume a particular architecture. Changing that is not easy.

Another issue, which I did not mention (but which is pertinent) is that c is a poor language for compilation. (Fran allen famously said 'c has destroyed our ability to advance the state of the art'.) In some respects, the optimisations performed automatically by modern high-performance cpus are more sophisticated than those done by c compilers, howbeit with less reach; the only reason they are able to do this is that they have direct control of the execution and hence have a greater ability to abstract over the side effects which are rampant in most c code.

E-graphs are interesting, but one still has to deal with combinatorial explosions. Are you alluding to some powerful search heuristic?

Your example touches on the problems of inflexible ABI, namely caller saved registers and the unknowability of side effects of external functions. Very weird that it can't reorder `r = x+y` despite it having no "observable" side effect until `return r`, since that return dominates the assignment, and there's no real relation between (the return, assignment) and (eff()).

I looked at it closer. In C, it is a side effect to assign to a variable. For an extern function not annotated __attribute__((pure)) the compiler has to assume the function call generates side effects. This prevents it from reordering the assignment and function call. Since x86-64 ABI has caller saved registers, in the case where it calls eff() first, it has to save x and y, and after the call, restore them.
The work on e-graphs I refer to is <https://egraphs-good.github.io/>—amortised rebuilds.

My example has nothing whatsoever to do with abi, and everything to do with ir. f and g are exactly semantically equivalent, and this equivalence is trivial to show; that the compilers generate different code for each demonstrates redundancies in their ir.

> it is a side effect to assign to a variable

But that variable is not aliased here.