|
|
|
|
|
by ashvardanian
572 days ago
|
|
There are dozens of libraries, frameworks, and compiler toolchains that try to abstract away SIMD capabilities, but I don't think it's a great approach. The only 2 approaches that still make sense to me: A. Writing serial vectorization-aware code in a native compiled language, hoping your compiler will auto-vectorize. B. Implementing natively for every hardware platform, as the ISA differences are too big to efficiently abstract away anything beyond 128-register float multiplication and addition. This article, in a way, an attempt to show how big the differences even for simple data-parallel floating-point tasks. |
|
.NET has roughly three vector APIs:
- Vector<T> which is platform-defined width vector that exposes common set of operations
- Vector64/128/256/512<T> which has wider API than the previous one
- Platform intrinsics - basically immintrin.h
Notably, platform intrinsics use respective VectorXXX<T> types which allows to write common parts of the algorithm in a portable way and apply platform intrinsics in specific areas where it makes sense. Also some method have 'Unsafe' and 'Native' variants to allow for vector to exhibit platform-specific behavior like shuffles since in many situations this is still the desired output for the common case.
The .NET's compiler produces competitive with GCC and sometimes Clang codegen for these. It's gotten particularly good at lowering AVX512.