Hacker News new | ask | show | jobs
by kevincox 1080 days ago
> take as long as needed to build the fastest possible code

I don't think this is true. You can always go slower and produce better code.

However in practice running the compiler at the max optimization setting is very common, so maybe there is appetite for more aggressive settings at the lower cost. I do think it makes sense to add this in some preset "profile" as it doesn't have any downsides besides time. I wonder if it makes sense to do something like gzip does, where compression level 9 is made available but is very rarely worth the time trade off. Why don't compiler settings frequently go past the point where it is reasonable for most people?

2 comments

Allowing higher compression levels generally has a minimal impart on the maintenance cost of the compression code, so there's essentially only a cost to the user for allowing arbitrarily high compression level settings. Higher optimization levels in a compiler generally greatly add to the code complexity, and thus impose a tax on the developers, not just a cost to the users.

For compression settings, a higher number is usually just setting limits on the search space, and perhaps setting upper limits on the amount of memory required. Setting higher compression levels rarely executes lots of extra code.

For compilers, higher levels typically enable entirely new optimizations. That means lots of extra writing of optimizations that will almost never get used, are often tough to debug (particularly their interaction with other optimizations), and increase the maintenance cost of the compiler.

This isn't necessarily true. For example the case listed above is just tweaking a config knob to 1. Often times there is also some amount of brute-force search that can be enabled. For example you can use a different inlining heuristic that attempts to inline more things then undoes it if not profitable. There are lots of cases like this where the cost-based optimizer can more aggressively search for possible solutions without actually writing new optimizations.
Realistically there is only so much optimization a compiler can do without having execution profiles of code. There is a HUGE amount of optimization a compiler can do if it knows how likely various branches are to be taken. For example, consider a simple switch statement. There are basically two ways to do code generation: you can use a jump table, or the case can be rewritten as if it's a series of if/else statements (i.e. by generating a cascade of compare/jump instructions). And in the latter case, what order the case statements are placed by the compiler will make a huge difference in performance, as you want the most likely case statements to come first. Note that I'm just mentioning switch statements here as they're easy to understand, this type of problem is ubiquitous when it comes to code generation decisions made by the compiler.

Back to our example: if the compiler doesn't have any information about how likely different cases are then it can't actually know what is going to be fastest. So your compiler just uses some heuristics to guess and pick something that is reasonable (but is often not optimal). This problem arises all over the place in optimization passes: the compiler could generate code X or code Y, but if it's just looking at your source code it doesn't know for sure which is going to be faster, so it just chooses something based on an arbitrary heuristic. If strategy X performs better than strategy Y 90% of the time, then the compiler is still making the wrong decision 10% of the time.

This is the fundamental reason why having an extra slow "max optimization" mode doesn't work. The reason the compiler doesn't generate better code isn't necessarily because the compiler doesn't have enough time to optimize things, it's because it's working with limited information. In my experience (from C++), the compilation times with clang vary only a small amount when compiling with -O1 vs -O2 vs -O3 (maybe 5%). And in fact in clang there are many optimization passes that are not enabled by default at -O3. The reason they aren't enabled is because they usually have dubious benefit (i.e. equally as likely to improve performance as to worsen performance).

I'm not a Rust programmer, but at least in C++ land the first step to generating optimized binaries should be using -O3 and microarch flags (e.g. -march), and the next most important thing is using FDO to optimize builds. Using FDO to optimize builds will have a much bigger impact on performance than fiddling with other optimization flags. FDO works by collecting branch profiles from compiled binaries, and then feeding that information back into the compiler so it can make better code generation and layout decisions. You can gather FDO profiles on any binary (on Linux you just need to collect branch events using perf), so if you really want better optimization that's my recommendation.

Compiling C++ at -O0 can often be slower in my experience than compiling at -O3