Hacker News new | ask | show | jobs
by krastanov 1144 days ago
To echo the sibling, while this should be avoided in Python, languages like Zig, C++, Julia, Rust can expect the compiler to SIMD-ify these expressions.
1 comments

Somewhat, but I think people vastly over-estimate their ability.

A common example is if there's any accumulation/reduction, compilers will almost entirely fail to generate SIMD unless you use -funsafe-math-optimizations type flags, because of non-associativity of floating point. Sum of squares is the classic example (not saying that specific operation is used in NN).

Explicit vectorization (e.g., using intrinsics) is almost always a relatively simple way to get orders of magnitude speedup compared to auto-vectorization, because of the above. Also because data layouts usually need to change as well (AoS vs SoA, etc.), though NN people seem to write decent data layouts.

I don't have any experience with `#pragma omp` type approaches which may be a middle ground.