| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mort96 282 days ago

You can do that to some limited degree, but not really.

There are more relevant modern examples, but one example that I really think illustrates the issue well is floating point instructions. The x87 instruction set is the first set of floating point instructions for x86 processors, first introduced in the late 80s. In the late 90s/early 2000s, Intel released CPUs with the new SSE and SSE2 extensions, with a new approach to floating point (x87 was really designed for use with a separate floating point coprocessor, with a design that's unfortunate now that CPUs have native floating point support).

So modern compilers generate SSE instructions rather than the (now considered obsolete) x87 instructions when working with floating point. Trying to run a program compiled with a modern compiler on a CPU without SSE support will just crash with an illegal instruction exception.

There are two main ways we could imagine supporting x87-only CPUs while using SSE instructions on CPUs with SSE:

Every time the compiler wants to generate a floating point instruction (or sequence of floating point instructions), it could generate the x87 instruction(s), the SSE instruction(s), and a conditional branch to the right place based on SSE support. This would tank performance. Any performance saving you get from using an SSE instruction instead of an x87 instruction is probably going to be outweighed by the branch.

The other option is: you could generate one x87 version and one SSE version of every function which uses floats, and let the dynamic linker sort out function calls and pick the x87 version on old CPUs and the SSE version on new CPUs. This would more or less leave performance unaffected, but it would, in the worst case, almost double your code size (since you may end up with two versions of almost every function). And in fact, it's worse: the original SSE only supports 32-bit floats, while SSE2 supports 64-bit floats; so you want one version of every function which uses x87 for everything (for the really old CPUs), one version of every function which uses x87 for 64-bit floats and SSE for 32-bit floats, and you want one function which uses SSE and SSE2 for all floats. Oh, and SSE3 added some useful functions; so you want a fourth version of some functions where you can use instructions from SSE3, and use a slower fallback on systems without SSE3. Suddenly you're generating 4 versions of most functions. And this is only from SSE, without considering other axes along which CPUs differ.

You have to actively make a choice here about what to support. It doesn't make a sense to ship every possible permutation of every function, you'd end up with massive executables. You typically assume a baseline instruction set from some time in the past 20 years, so you're typically gonna let your compiler go wild with SSE/SSE2/SSE3/SSE4 instructions and let your program crash on the i486. For specific functions which get a particularly large speed-up from using something more exotic (say, AVX512), you can manually include one exotic version and one fallback version of that function.

But this causes the problem that most of your program is gonna get compiled against some baseline, and the more constrained that baseline is, the more CPUs you're gonna support, but the slower it's gonna run (though we're usually talking single-digit percents faster, not orders of magnitude faster).