| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pjmlp 2172 days ago
	The main problem is software, with GPGPUs you need to explicitly program for them, while with stuff like AVX there is this implicit hope that you just code as always and the compiler will take care of the rest via auto-vectorization and PhD level optimization algorithms. Because outside artificial intelligence, graphics and audio, there is little else that common applications would use the GPGPU for, so the large majority of software developers keeps ignoring heterogeneous programming models.

4 comments

teruakohatu 2172 days ago

> the compiler will take care of the rest via auto-vectorization and PhD level optimization algorithms.

With how AVX512 is implemented, there isn't much point in a compiler auto optimizing general purpose code to use it, because even if there is a theoretical speedup, it may well be slower in practice.

link

pjmlp 2172 days ago

There might not be one, but all major C, C++, Hotspot, Graal and RyuJIT compilers do it to some extent.

link

throwaway_pdp09 2172 days ago

> while with stuff like AVX there is this implicit hope that you just code as always and the compiler will take care of the rest via auto-vectorization and PhD level optimization algorithms

No. I recently could really, really have used the packed saturated integer arithmetic and horizontal addition in AVX2 (but my old machine doesn't support it) and even better, the same but 512 bits wide on AVX512. It would only have been 6 or 7 instructions, if that, but it was inner loop, and mattered. Using compiler intrinsics would have been fine. I think you're looking at things too narrowly.

link

pjmlp 2172 days ago

I am looking at it of the point of view of joe/jane developer that cannot tell head from tail regarding vector programming and doesn't even know what compiler intrinsics are for, and use languages that don't expose them anyway.

link

voldacar 2171 days ago

Well those people will never be getting the most out of their CPUs to begin with.

link

pjmlp 2171 days ago

Which is the whole point of "this implicit hope that you just code as always and the compiler will take care of the rest via auto-vectorization and PhD level optimization algorithms.", because not only do those people not get it, there is a general decline in using languages that expose vector intrisics like C and C++ for regular LOB applications.

link

viraptor 2172 days ago

In my ideal world you'd be able to mark a function "this should compile to / run on gpgpu" and the compiler would potentially tell you why it can't do that. I'm not even sure if anything is stopping us apart from implementing that apart from the effort required. Sure, many ways to write that code will result in terrible performance, but it would still be closer to the auto-vectorisation experience.

Actually we already have openmp to cuda (http://www2.engr.arizona.edu/~ece569a/Readings/GPU_Papers/3....) so just making it more production-ready would be perfect.

link

gnufx 2171 days ago

The current OpenMP spec has GPU offload features specifically for what was expected of the Sierra supercomputer. I'm not sure how relevant a paper that old (relatively, I hasten to add) is.

link

zozbot234 2172 days ago

> Because outside artificial intelligence, graphics and audio, there is little else that common applications would use the GPGPU for, so the large majority of software developers keeps ignoring heterogeneous programming models.

I think you got this backwards - the lack of developers' interest is what leads to the mistaken impression that GPU compute is only good for multimedia and FP-crunching workloads. Even looking at the success of GPU compute in mining cryptocoins (only ASIC's do better) ought to be enough to tell you that we could do a lot more with them if we cared to.

link

pjmlp 2172 days ago

From my point of view cryptomining is a useless fad, and typical line of business applications don't need anything more than what I listed.

link