| If you like SIMD and would like to dabble in it, I can strongly recommend trying it out in C# via its platform-agnostic SIMD abstraction. It is very accessible especially if you already know a little bit of C or C++, and compiles to very competent codegen for AdvSimd, SSE2/4.2/AVX1/2/AVX512, WASM's Packed SIMD and, in .NET 9, SVE1/2: https://github.com/dotnet/runtime/blob/main/docs/coding-guid... Here's an example of "checked" sum over a span of integers that uses platform-specific vector width: https://github.com/dotnet/runtime/blob/main/src/libraries/Sy... Other examples: CRC64 https://github.com/dotnet/runtime/blob/main/src/libraries/Sy... Hamming distance https://github.com/dotnet/runtime/blob/main/src/libraries/Sy... Default syntax is a bit ugly in my opinion, but it can be significantly improved with helper methods like here where the code is a port of simdutf's UTF-8 code point counting: https://github.com/U8String/U8String/blob/main/Sources/U8Str... There are more advanced scenarios. Bepuphysics2 engine heavily leverages SIMD to perform as fast as PhysX's CPU back-end: https://github.com/bepu/bepuphysics2/blob/master/BepuPhysics... Note that practically none of these need to reach out to platform-specific intrinsics (except for replacing movemask emulation with efficient ARM64 alternative) and use the same path for all platforms, varied by vector width rather than specific ISA. |