Hacker News new | ask | show | jobs
by copx 4636 days ago
It is common knowledge among GCC users. In the past using -O3 was rare because it often generated downright broken code. There used to be an official warning about that.

The situation is better nowadays but still, as far as I know, no major Linux distro uses -O3 as the default for binary packages.

-O3 can generate slower code because of the aggressive inlining and loop unrolling enabled. These optimizations are very tricky because of their effect on cache use. Basically all that extra code can push other needed code/data out of the cache, which can cause a noticeable decrease in performance.

1 comments

I think it's 'common knowledge' which has outlived it's relevance as I can't recall the last time I found -O2 outperforming -O3.

Practically every performance oriented open source program I come across also defaults to -O3 these days, or sometimes -Ofast which also enables -ffast-math.

>-O3 can generate slower code because of the aggressive inlining and loop unrolling enabled

-O3 turns on vectorization and inlining optimizations but I can't recall any loop unrolling options which are turned on at -O3.

-funroll-loops is not turned on at any of the -O (including -O3) levels due to it being one of the hardest to get right without any runtime data as basis (which is why the only option that turns it on is PGO - profile generated optimization).

Note that I'm talking about modern versions of GCC, if you are using GCC 4.21 on OSX then this (-O2 > -O3) may still typically be the case.

>The situation is better nowadays but still, as far as I know, no major Linux distro uses -O3 as the default for binary packages.

I'd say they typically use the upstream optimization settings.

>I think it's 'common knowledge' which has outlived it's relevance as I can't recall the last time I found -O2 outperforming -O3.

I can, was about 4 months ago with GCC 4.8.0.

>practically every performance oriented open source program I come across also defaults to -O3 these days

How large is your sample size there? I have only seen -O3 in the default makefiles of audio/video encoders. Those tend to be a natural fit for -O3. In contrast, here is the current makefile of my favorite "performance oriented" FOSS program:

http://repo.or.cz/w/luajit-2.0.git/blob_plain/HEAD:/src/Make...

CCOPT= -O2 -fomit-frame-pointer # Note: it's no longer recommended to use -O3 with GCC 4.x. # The I-Cache bloat usually outweighs the benefits from aggressive inlining.

>I can't recall any loop unrolling options which are turned on at -O3.

You are right (I just looked it up). Guess my memory failed me there.

>I'd say they typically use the upstream optimization settings

I wish! Packagers love to fool around with the upstream sources and makefiles to make them conform to whatever "standards" they have.

>How large is your sample size there? I have only seen -O3 in the default makefiles of audio/video encoders. Those tend to be a natural fit for -O3

Well I very much implied 'performance-oriented' programs as we where discussing 'performance' generated by compiler options, which indeed are a natural fit for -O3.

For which my 'sample size' would be software like encoders, archivers, emulators, 3d renderers etc.

Obviously there's little point in using -O3 on your text editor (yes, extreme example), basically for any non performance-oriented software -O3 will likely only serve to increase the binary size as any potential gains will be unnoticable.

>I wish! Packagers love to fool around with the upstream sources and makefiles to make them conform to whatever "standards" they have.

Not really my experience with Arch packages, but of course I haven't looked at the PKGBUILDS for even 1% of all available packages, basically only those performance oriented packages on which I rely.