|
|
|
|
|
by vlovich123
728 days ago
|
|
The main flags to look at: * mtune/march - specifying a value of native optimizes for the current machine, x86-64-v1/v2/v3/v4 for generations or you can specify a specific CPU (ARM has different naming conventions). Recommendation: use the generation if distributing binaries, native if building and running locally unless you can get much much more specific * -O2 / -O3 - turn on most optimizations for speed. Alternatively Os/Oz for smaller binaries (sometimes faster, particularly on ARM) * -flto=thin - get most of the benefits of LTO with minimal compile time overhead * pgo - if you have a representative workload you can use this to replace compiler heuristics with real world measurements. AutoFDO is the next evolution of this to make it easier to connect data from production environments to compile time. * math: -fno-math-errno and -fno-trapping-math are “safe” subsets of ffast-math (i.e. don’t alter the numerical accuracy). -fno-signed-zeros can also probably be considered if valuable. |
|