| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lifthrasiir 1026 days ago
	> Compilers try to hoist constant conditions outside of loops but they're bad at it. Even in the trivial example above, on -O2 gcc does a redundant check with every loop iteration. GCC is actually good at that, -O3 has no issue recognizing it. In fact there even is a very explicit option (-funswitch-loops) responsible for extracting loop invariants. It is not enabled on -O2 because it has a space-speed tradeoff. If this optimization is truly desirable even on -O2, `#pragma GCC optimize("-funswitch-loops")` can be used to force it.

2 comments

msla 1026 days ago

I don't know why people are so reluctant to just use -O3

link

qweqwe14 1026 days ago

Because -O3 enables optimizations that break code with some lesser known cases of UB[1]. So people just don't enable -O3 because they can't be bothered to fix UB in their codebase, or because they think it's a "compiler bug".

There are other reasons for not using -O3 which people already mentioned.

As a matter of fact, Gentoo specifically says in their docs that -O3 breaks some packages[2].

[1] https://stackoverflow.com/questions/57889116/different-evalu...

[2] https://wiki.gentoo.org/wiki/GCC_optimization#-O

link

gpderetta 1025 days ago

UB-breaking optimizations are already enabled at O1. I don't think -O3 is particularly noticeable for that: your first link shows code that breaks between O3 and no optimization at all.

O3 does more aggressive inlining and generally expensive optimizations and optimizations that are not necessarily profitable.

link

qweqwe14 1025 days ago

I was only saying that -O3 enables _even more_ optimizations, which can lead to UB code that works with -O2 but misbehaves on -O3. And because -O3 is seldom used (for reasons other than code breakage), the code patterns that are actually UB but don't get compiled to bad code at -O2 (by e.g. gcc) are lesser known. It all depends on how much a given compiler is willing to make assumptions based on UB and what's permitted by C spec to make optimizations. At -O3 it probably makes a lot more of those assumptions, along the lines of "the spec says this is undefined behavior so I will assume it can never happen and optimize it based on that". The more aggressive inlining in conjunction with that probably exposes more even more UB.

Edit: found example where code works on -O2 but breaks on -O3, related to using abnormal amounts of stack space https://stackoverflow.com/questions/47058978/g-optimization-...

link

jandrese 1025 days ago

Isn’t the warning that -O3 might break stuff horribly obsolete? It was an issue in 2.95, but that was many years ago.

link

lifthrasiir 1026 days ago

Because i) not all programs benefit tremendously from -O3 and ii) it adds to the compile time and binary size? It would be great to have some optimization level between -O2 and -O3 so that only portions that have a potential to be improved more than, say, 5% are compiled using -O3. In fact the existence of `#pragma GCC optimize` does suggest that this might be possible today with some heuristics... (Or use PGO, which will have the same effect. But PGO is still a novelty in 2023.)

link

glandium 1026 days ago

Also, in practice, -O3 doesn't necessarily lead to faster code. https://people.cs.umass.edu/~emery/pubs/stabilizer-asplos13.... https://m.youtube.com/watch?v=r-TLSBdHe1A

link

KeplerBoy 1026 days ago

you can manually set the optimizations (and order of optimizations) you want. -O3 ist just a predefined set of optimizations.

link

lifthrasiir 1026 days ago

But it is not portable among compilers (for example, both GCC and Clang support SLP-based autovectorization but with different flags). -O# is one of a few flags that have roughly same meaning across many compilers.

link

crest 1025 days ago

Because in the real world code size is often more important than throughput on for large inputs in microbenchmarks. There is a point of diminishing returns and in the opinion of many -O3 is beyond that. If it does the job and the job is worth doing use it.

link

wycy 1026 days ago

Using -O3 was resulting in correctness problems in my work codebase in the past. Nowadays the compiler (ifort) crashes when using -O3 for some reason.

link

qweqwe14 1026 days ago

> Using -O3 was resulting in correctness problems

Likely because your codebase had UB in it that didn't show itself until a certain optimization level. The solution is to fix all instances of UB. See my comment above.

link

wycy 1025 days ago

I believe in our case it was a compiler bug, because it only happened for a few versions of ifort and was eventually fixed. But it scared us off from using -O3.

link

grotorea 1026 days ago

How often does O3 vs O2 enable the sort of optimizations that break code that is technically incorrect?

link

thesnide 1025 days ago

I just gave up and use tcc.

Forces me to use a better algorithm when it matters. Which more often than not, does not.

;-)

link

voidstarcpp 1026 days ago

>GCC is actually good at that, -O3 has no issue recognizing it.

imo, if you have to go to O3 or enable a pragma to get an "obvious" optimization then this is undesirable and probably something the programmer still needs to be conscious of, especially since we spend so much time doing development builds that are not at the maximum release optimization level.

link

lifthrasiir 1025 days ago

"Misaligned" would be a better word than "obvious". It is well known and understood that a higher optimization level means more compilation time, potential speed gain and more subtle breakage in case of presence of UB, and being the maximum level, -O3 also implies potential binary size increase as well. In this understanding I believe `-funswitch-loops` is correctly placed on -O3. But ideally we want less compilation time, potential speed gain, less subtle breakage and insignificant binary size increase at once, and you can argue that all existing optimization levels are too far from that ideal.

link