Hacker News new | ask | show | jobs
by the_clarence 2805 days ago
I see a lot of applications trying to take advantage of SIMD, but what when you try to run them on systems that don't support these instructions? My guess is that you need to write multiple files taking advantage of different sets of instructions and then dynamically figure out which to use at runtime with cpuid, but isn't that cumbersome and a way to inflate a codebase dramatically?
6 comments

Speaking of the Intel world it's not that bad. There are three major version right now: SSE4.1, AVX and AVX2 (AVX512 is not popular yet).

In the past (roughly 10 years ego) it was a problem, as there were: MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, XOP, 3DNow and perhaps a few more extensions.

it's not a typo, there are three 'S' :)

Sorry, I forgot that in HN comments the asterisk char is an italics indicator. There should be a mark after SSSE3.
> inflate a codebase dramatically

This is usually only done for very specific algorithms. Unicode validation, hash functions, things like that. Unless you have an absolutely tiny application (which you might, if you're some kind of microcontroller), it's going to be a small percentage of your overall code size.

In a microcontroller, I don't think you'll be needing AVX2...
I'm not sure where exactly the line is drawn between a microcontroller and a CPU, but even some of the lower end ARMs support SIMD instructions.
Generally speaking, I think if you care enough about performance to write manual SIMD code, being a little more cumbersome is a tradeoff you’re willing to make.
In my understanding when you use intrinsics and build for a processor without support for the intrinsics then GCC for example will replace it with equivalent code.
Unfortunately, no.

That is the case with GCCs __builtin functions. With a few exceptions, intrinsics are basically macros for inline asm that the compiler can reason about.

If on x86-64 you use a _mm256* intrinsic and compile without AVX support you just get a compile error, not a pair of equivalent SSE instructions.

Even worse. You mostly get run-time errors when the built machine supported that feature, your machine doesn't, and the features aren't separated into multiversioning or loading different shared libs.
That is true. Here's a couple of negatives. First, you still need to build once for each architecture, either as different executables, or as different object files, and provide some dispatch mechanism to use the right one based on what hardware is available.

Second, if the intrinsics aren't built-in then there may be faster alternatives than using the GCC emulated version.

You must be thinking about GCC "builtins" because there is no emulation for x86 SIMD intrinsics (ie the things in <immintrin.h>).
Oh, indeed I was. Thanks for pointing out my error. I was specifically thinking about POPCNT.
Darwin platforms ship binaries with different slices for different versions of Intel processors. You have the generic x86_64 and the newer x86_64h which supports more features.