Hacker News new | ask | show | jobs
by exDM69 3732 days ago
> And wrappers exists in the C++ ecosystem, C programmers are stuck to intrinsics.

If you can accept working with GNU extensions that are available in recent-ish GCC and Clang (but not MSVC, not sure about Intel ICC), there are pretty nice vector extensions [0].

With them you can get standard binary operators working for arithmetic (+,-,*,/ etc) and shuffling with __builtin_shuffle. These are CPU independent, the same code compiles neatly to ARM NEON as well as x86 SSE+AVX+FMA. All you need is a typedef with an __attribute__.

The vector extension functions don't cover the whole instruction sets but the vector types are compatible with _mm128 and NEON native formats so you can resort to intrinsics when necessary.

However, for a lot of SIMD tasks I encounter, just basic arithmetic + shuffles is more than 80% of what I need.

If you want to see some examples, take a look at my collection of 3d graphics and physics related SIMD routines [1]. (note: this project could use some help, let me know if you're interested in doing something with it or porting some of the hand optimized routines to more used math libs like glm)

[0] https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html#Ve... [1] https://github.com/rikusalminen/threedee-simd

1 comments

> If you can accept working with GNU extensions that are available in recent-ish GCC and Clang

I do my private project in C++ so it's not a case, but at my current company we use also MSVC. I wish we could abandon that compiler and work with GCC or clang only.

> However, for a lot of SIMD tasks I encounter, just basic arithmetic + shuffles is more than 80% of what I need.

Your remaining 20% is my 80%. :)

> ... but at my current company we use also MSVC. I wish we could abandon that compiler and work with GCC or clang only.

Good news! These days you can produce MSVC compatible binaries with Clang or even use Clang as a compiler from the C++ IDE.

Whether or not you can do this in practice is another matter, but it can be done.

> Your remaining 20% is my 80%. :)

Yeah, if you look at my examples, they're rather straightforward arithmetic with 4 dimensional vectors. There's very little need for any integer arithmetic or more exotic combinations of operations. A little fused multiply-and-add here and there.

But I haven't seen a better method for this, most of the code is CPU-agnostic and will compile to x86 or ARM code using all the available instruction sets (depending on compiler arguments, e.g. -mavx2 or -march=native). I really haven't seen a SIMD math lib with so little duplication for different CPUs elsewhere.