|
|
|
|
|
by canarypilot
2130 days ago
|
|
Clang works much the same way by default. All compilers have to pick a baseline, and assuming it is arch_of(your_core) is a recipe for disaster if you are compiling on your high-end machine for release to a broad user base. For many users of GCC the intended user base will be “the same as my current distribution” and so GCC can be configured using —-with-{cpu, arch, schedule, tune} depending on the architecture to set a default in line with user/distro expectations. |
|
Performance critical programs usually deal with that by either providing multiple builds targeting different CPU features/CPUs, or let the user compile themselves from source with the right flags, or have CPU runtime detection and provide alternative versions of a few important performance critical functions for different CPU feature sets (e.g. browsers, ffmpeg, glibc, various VMs/runtimes like Java Hotspot or dotnet, do that) while the majority of code is still compiled for a lowest common subset of CPU features.
Of course, languages that run on (usually) JIT-ed VMs/runtimes have a bit of an advantage here, as the actual machine code is generated from source code or byte code only at runtime, at which point it is clear what kind of CPU is underneath the program. They can - but not always do - implement optimized JITting depending on the CPU features. (of course, every language/VM/runtime comes with its own set of pros and cons and there is no silver bullet).
To make matters even more complicated: compiling code to use the newest CPU features or newest optimization techniques will not mean it will actually run faster. E.g. AVX512 may actually slow down your code (when multi-threaded) on many CPUs[1]. Or heavily "optimized" code may become larger in machine code, to the point where your "unoptimized" code may run faster because it fits in the CPU cache(s) properly while the "optimized" version does not. "-Os" optimized code may run faster than "-Ofast" optimized code for this matter. Or it may not. Depending on the actual code.
I remember compiling ffmpeg and libx264 myself a bunch of years ago, with the "best" flags for my system, starting with "-march=" and "-Ofast" of course, thinking I am a tough skillful super geek now. Imagine my surprise when I tested the performance against a default ffmpeg build and my optimized build was 2-5% slower.
[1] https://blog.cloudflare.com/on-the-dangers-of-intels-frequen...