| HN Mirror

Exactly. What you do not want to happen is that your program crashes with fancy "illegal instruction" errors on older CPUs that do not have the newest fancy features yet, but are still in use by a large chunk of your (paying) users.

Performance critical programs usually deal with that by either providing multiple builds targeting different CPU features/CPUs, or let the user compile themselves from source with the right flags, or have CPU runtime detection and provide alternative versions of a few important performance critical functions for different CPU feature sets (e.g. browsers, ffmpeg, glibc, various VMs/runtimes like Java Hotspot or dotnet, do that) while the majority of code is still compiled for a lowest common subset of CPU features.

Of course, languages that run on (usually) JIT-ed VMs/runtimes have a bit of an advantage here, as the actual machine code is generated from source code or byte code only at runtime, at which point it is clear what kind of CPU is underneath the program. They can - but not always do - implement optimized JITting depending on the CPU features. (of course, every language/VM/runtime comes with its own set of pros and cons and there is no silver bullet).

To make matters even more complicated: compiling code to use the newest CPU features or newest optimization techniques will not mean it will actually run faster. E.g. AVX512 may actually slow down your code (when multi-threaded) on many CPUs[1]. Or heavily "optimized" code may become larger in machine code, to the point where your "unoptimized" code may run faster because it fits in the CPU cache(s) properly while the "optimized" version does not. "-Os" optimized code may run faster than "-Ofast" optimized code for this matter. Or it may not. Depending on the actual code.

I remember compiling ffmpeg and libx264 myself a bunch of years ago, with the "best" flags for my system, starting with "-march=" and "-Ofast" of course, thinking I am a tough skillful super geek now. Imagine my surprise when I tested the performance against a default ffmpeg build and my optimized build was 2-5% slower.

[1] https://blog.cloudflare.com/on-the-dangers-of-intels-frequen...