Hacker News new | ask | show | jobs
by garmaine 1654 days ago
That you can program in x86 SIMD in a high-level language and find it intuitive to use is remarkable. It was not designed for that and exposes primitives which make more sense for assembly-level and automatic compiler optimization. Cray, on the other hand, has a vector ISA that is explicitly designed for integration into high-level languages. You basically write a "for loop" like instruction which sets up the source and dest arrays, then do regular scalar operations (add, multiply, etc.) using virtual source & destination registers. It feels just like coding the CPU's integer APU. The CPU itself handles the magic of parallelizing these operations across its vector units.

So sorry I was a bit confused because Cray/RISC-V is explicitly designed to be easy to use from high-level languages in a way that x86's SSE et al were not. So I thought maybe you had mixed up the two in your question or something. But I guess you just haven't had the pleasure of working with a Cray before!

1 comments

Can you show me any source code of a Cray being programmed in a high level language? What you describe sounds like a higher level ISA, but that has to be easily expressed in C++ for the argument to hold (or alternatively any high level language available at the time).

The thing with the Intel model is that although the programming model in the abstract is probably worse (although I'm curious if it allows a wider processor), it is trivial to use conceptually if you understand roughly which instructions you want i.e. it's just a blob as far as the compiler is concerned.

The compiler I work on supports Intel SIMD, I'm not sure it could be easily made to get the most out of a vector programming model without a lot of rewrites. It could, however, basically emulate the fixed width things in terms of a vector ISA if needs be.

See the #pragma directives in this document:

http://www.audentia-gestion.fr/CRAY/PDF/Cray_C_and_C___Refer...

You literally just write regular old C code doing a tight inner-loop computation, and use pragmas to tell the compiler what it needs to safely parallelize.

Of course these days you can do the same thing in any vectorizing compiler. But the point is that a modern vectorizing compiler has to do some pretty impressive transformations to generate SIMD code which looks nothing like the original, whereas the Cray code pretty much compiles to the same thing when vectorized.

I think we've been talking past eachother (I should have been more direct perhaps). This programming model is not what I had in mind - I completely accept your point about the Cray model but I meant something slightly dumber than loop parallelization.

In very tight situations it's common to write SIMD intrinsics directly rather than rely on the compilers ability to make the transformations itself. Intel's SIMD maybe be ugly but it is also very topologically easy to navigate, if that makes sense.

I'm going to write some arm SVE code and compare, at some point.

Yeah the point with the Cray vector opcodes is that the SIMD instructions ARE the regular scalar/FPU instructions. The compiler doesn’t do anything to different when it vectorizes the loop. It is the CPU that does the vector optimization. The code just provides hints (to the CPU, not the compiler) about how to vectorize.