Hacker News new | ask | show | jobs
by pjmlp 2172 days ago
The main problem is software, with GPGPUs you need to explicitly program for them, while with stuff like AVX there is this implicit hope that you just code as always and the compiler will take care of the rest via auto-vectorization and PhD level optimization algorithms.

Because outside artificial intelligence, graphics and audio, there is little else that common applications would use the GPGPU for, so the large majority of software developers keeps ignoring heterogeneous programming models.

4 comments

> the compiler will take care of the rest via auto-vectorization and PhD level optimization algorithms.

With how AVX512 is implemented, there isn't much point in a compiler auto optimizing general purpose code to use it, because even if there is a theoretical speedup, it may well be slower in practice.

There might not be one, but all major C, C++, Hotspot, Graal and RyuJIT compilers do it to some extent.
> while with stuff like AVX there is this implicit hope that you just code as always and the compiler will take care of the rest via auto-vectorization and PhD level optimization algorithms

No. I recently could really, really have used the packed saturated integer arithmetic and horizontal addition in AVX2 (but my old machine doesn't support it) and even better, the same but 512 bits wide on AVX512. It would only have been 6 or 7 instructions, if that, but it was inner loop, and mattered. Using compiler intrinsics would have been fine. I think you're looking at things too narrowly.

I am looking at it of the point of view of joe/jane developer that cannot tell head from tail regarding vector programming and doesn't even know what compiler intrinsics are for, and use languages that don't expose them anyway.
Well those people will never be getting the most out of their CPUs to begin with.
Which is the whole point of "this implicit hope that you just code as always and the compiler will take care of the rest via auto-vectorization and PhD level optimization algorithms.", because not only do those people not get it, there is a general decline in using languages that expose vector intrisics like C and C++ for regular LOB applications.
In my ideal world you'd be able to mark a function "this should compile to / run on gpgpu" and the compiler would potentially tell you why it can't do that. I'm not even sure if anything is stopping us apart from implementing that apart from the effort required. Sure, many ways to write that code will result in terrible performance, but it would still be closer to the auto-vectorisation experience.

Actually we already have openmp to cuda (http://www2.engr.arizona.edu/~ece569a/Readings/GPU_Papers/3....) so just making it more production-ready would be perfect.

The current OpenMP spec has GPU offload features specifically for what was expected of the Sierra supercomputer. I'm not sure how relevant a paper that old (relatively, I hasten to add) is.
> Because outside artificial intelligence, graphics and audio, there is little else that common applications would use the GPGPU for, so the large majority of software developers keeps ignoring heterogeneous programming models.

I think you got this backwards - the lack of developers' interest is what leads to the mistaken impression that GPU compute is only good for multimedia and FP-crunching workloads. Even looking at the success of GPU compute in mining cryptocoins (only ASIC's do better) ought to be enough to tell you that we could do a lot more with them if we cared to.

From my point of view cryptomining is a useless fad, and typical line of business applications don't need anything more than what I listed.