| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by chapplap 2256 days ago

That's not exactly true once you have to deal with vector extensions like AVX-512. It's quite a pain to write by hand (C intrinsics) and many of the ways to abstract it away end up giving you a GPU-like programming model (eg. Intel ISPC).

Plus, this has largely been tried before with Xeon Phi and it didn't end so well.

Huge vector units like AVX-512 are mainly useful for workloads that need huge amounts of RAM that you just can't get with a GPU, or for workloads that are very latency sensitive and incompatible with GPU task scheduling because they are in some other CPU-bound code.

2 comments

imtringued 2256 days ago

>Huge vector units like AVX-512 are mainly useful for workloads that need huge amounts of RAM that you just can't get with a GPU, or for workloads that are very latency sensitive and incompatible with GPU task scheduling because they are in some other CPU-bound code.

There are a lot of tasks that a GPU can do faster than a CPU but it would require batching a large amount of work before you can gain a speedup. EYPC CPUs do not suffer that limitation. If all you have is an array with 4 elements you can straight up run the vector instructions and then immediately switch back to scalar code. Meanwhile with a GPU you probably need at least an array with 10000 elements or more.

link

jiggawatts 2256 days ago

> It's quite a pain to write by hand

And we all know that autovectorisation is hit-and-miss at best.

I wonder if there will be a new C-like language that has portable SIMD-like capabilities in the same sense that "C is a portable assembly language".

link

DeathArrow 2255 days ago

>The first observation is that in modern compilers, the resulting performance from auto-vectorization optimization is still far from the architectural peak performance. https://dl.acm.org/doi/fullHtml/10.1145/3356842

Maybe we can get get more benefits if we invest more resources in optimizing compilers than in inventing yet another Javascript framework?

link

Fronzie 2256 days ago

Another one is SyCL: https://www.khronos.org/sycl/ the SIMD part is OpenCL, SyCL provides scheduling/memory transfer on top.

link

mmozeiko 2256 days ago

There is - Intel SPMD Program Compiler. https://ispc.github.io/

link

benibela 2255 days ago

But does that work on AMD ?

link