| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vlovich123 728 days ago

The main flags to look at:

* mtune/march - specifying a value of native optimizes for the current machine, x86-64-v1/v2/v3/v4 for generations or you can specify a specific CPU (ARM has different naming conventions). Recommendation: use the generation if distributing binaries, native if building and running locally unless you can get much much more specific

* -O2 / -O3 - turn on most optimizations for speed. Alternatively Os/Oz for smaller binaries (sometimes faster, particularly on ARM)

* -flto=thin - get most of the benefits of LTO with minimal compile time overhead

* pgo - if you have a representative workload you can use this to replace compiler heuristics with real world measurements. AutoFDO is the next evolution of this to make it easier to connect data from production environments to compile time.

* math: -fno-math-errno and -fno-trapping-math are “safe” subsets of ffast-math (i.e. don’t alter the numerical accuracy). -fno-signed-zeros can also probably be considered if valuable.

1 comments

vbezhenar 728 days ago

Also I learned recently that there's `-Og` which enables optimizations suitable for debug build.

link

vlovich123 728 days ago

In practice I’ve had limited success with that flag. It still seems to enable optimizations that make debugging difficult.

link

senkora 727 days ago

Agreed. I like to compile most translation units with -O3 and then only compile the translation units that I want to debug with -O0. That way I can often end up with a binary that's reasonably fast but still debuggable for the parts that I care about.

link

vlovich123 726 days ago

Yup that’s what I’ve resorted to (in Rust I do it at the crate level instead of translation unit). The only downside is forgetting about it and then wondering why stepping through is failing to work properly.

link