|
|
|
|
|
by dundarious
1152 days ago
|
|
Somewhat, but I think people vastly over-estimate their ability. A common example is if there's any accumulation/reduction, compilers will almost entirely fail to generate SIMD unless you use -funsafe-math-optimizations type flags, because of non-associativity of floating point. Sum of squares is the classic example (not saying that specific operation is used in NN). Explicit vectorization (e.g., using intrinsics) is almost always a relatively simple way to get orders of magnitude speedup compared to auto-vectorization, because of the above. Also because data layouts usually need to change as well (AoS vs SoA, etc.), though NN people seem to write decent data layouts. I don't have any experience with `#pragma omp` type approaches which may be a middle ground. |
|