| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by garmaine 1663 days ago

See the #pragma directives in this document:

http://www.audentia-gestion.fr/CRAY/PDF/Cray_C_and_C___Refer...

You literally just write regular old C code doing a tight inner-loop computation, and use pragmas to tell the compiler what it needs to safely parallelize.

Of course these days you can do the same thing in any vectorizing compiler. But the point is that a modern vectorizing compiler has to do some pretty impressive transformations to generate SIMD code which looks nothing like the original, whereas the Cray code pretty much compiles to the same thing when vectorized.

1 comments

mhh__ 1662 days ago

I think we've been talking past eachother (I should have been more direct perhaps). This programming model is not what I had in mind - I completely accept your point about the Cray model but I meant something slightly dumber than loop parallelization.

In very tight situations it's common to write SIMD intrinsics directly rather than rely on the compilers ability to make the transformations itself. Intel's SIMD maybe be ugly but it is also very topologically easy to navigate, if that makes sense.

I'm going to write some arm SVE code and compare, at some point.

garmaine 1662 days ago

Yeah the point with the Cray vector opcodes is that the SIMD instructions ARE the regular scalar/FPU instructions. The compiler doesn’t do anything to different when it vectorizes the loop. It is the CPU that does the vector optimization. The code just provides hints (to the CPU, not the compiler) about how to vectorize.