Bad news. For SIMD there are not cross-platform intrinsics. Intel intrinsics map directly to SSE/AVX instructions and ARM intrinsics map directly to NEON instructions.
I beg to differ :) std::experimental::simd has a very limited set of operations: mostly just math, very few shuffles/swizzles. Last I checked, it also only worked in a recent version of GCC.
We do indeed have cross-platform intrinsics here: github.com/google/highway. Disclosure: I am the main author.
Not really, unfortunately, and it’s a pre-existing framework for teaching a class, so simplicity of compilation is extra important. Also if I try to isolate the SIMD bits in C++ I’ll lose the opportunity to have them be inlined which will defeat the optimization purpose.
> Also if I try to isolate the SIMD bits in C++ I’ll lose the opportunity to have them be inlined which will defeat the optimization purpose.
Agreed. Usually the interface would be something like RunEntireAlgorithm(), not DotProduct().
> For those that are new to this, can you give an example of a kind of computation or algorithm which is well-served by your project but not possible with vector extensions
Sure. Vector extensions are OKish for simple math but JPEG XL includes nontrivial cross-lane operations such as
transpose and boundary handling for convolution.
__builtin_shufflevector requires a known vector length, and can be pessimized (fusing two into one general all-to-all permute which is more expensive than two simple shuffles).
Also, vqsort (https://github.com/google/highway/tree/master/hwy/contrib/so...) almost entirely consists of
operations not supported by the extensions, and actually works out of the box on variable-length RISC-V and SVE, which compiler extensions cannot.
Just a heads up, as far as I know that’s more of a porting/learning tool than a production tool.
I remember us looking deeply into this and decided to hand write the SSE intrinsics. They usually map 1:1 but we had some unexpected differences in algorithm output between the x86 binary and the ARM binary when compiled with this.
But this was also back in 2019 or so, maybe it’s better now!
We do indeed have cross-platform intrinsics here: github.com/google/highway. Disclosure: I am the main author.