|
|
|
|
|
by garmaine
1663 days ago
|
|
See the #pragma directives in this document: http://www.audentia-gestion.fr/CRAY/PDF/Cray_C_and_C___Refer... You literally just write regular old C code doing a tight inner-loop computation, and use pragmas to tell the compiler what it needs to safely parallelize. Of course these days you can do the same thing in any vectorizing compiler. But the point is that a modern vectorizing compiler has to do some pretty impressive transformations to generate SIMD code which looks nothing like the original, whereas the Cray code pretty much compiles to the same thing when vectorized. |
|
In very tight situations it's common to write SIMD intrinsics directly rather than rely on the compilers ability to make the transformations itself. Intel's SIMD maybe be ugly but it is also very topologically easy to navigate, if that makes sense.
I'm going to write some arm SVE code and compare, at some point.