| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by whizzter 25 days ago

Agreed, fixed with vectors needs to be a language feature, better compile times and would solve issues for most people.

Personally, I think that like Clang way to adding GLSL like vectors and semantics would've gone a long way. SVE might be an elegant design, but in reality there are probably a multiple factor of game and other 3d code being written that needs vectors compared to other fields, and there limited vector sizes aren't really a problem.

And honestly, considering the story of AVX512.. with 512 bit vectors being removed from mainstream by Intel, do we really really need longer ones despite it being from a "scalable design"?

2 comments

adrian_b 25 days ago

Intel has been forced to reintroduce 512-bit vectors in the mainstream, because of the competition from AMD.

Starting with the Intel Nova Lake CPUs, around the end of this year, all future AMD and Intel CPUs will provide 512-bit vectors, like also the current AMD Zen 5 and Zen 4 CPUs.

The 512-bit vector length is more convenient than other lengths, because on the AMD and Intel CPUs it coincides with the length of a cache line. Because of this, it is easier to optimize simultaneously for the best cache usage.

For GPUs, which favor throughput over latency, 1024-bit and 2048-bit vector register widths are frequently used. For CPUs it is unlikely that widths greater than 512-bit would be useful, as the vector operations that should be done on CPUs are those for which the high latency of using a GPU is undesirable.

link

camel-cdr 25 days ago

greater then 512-bit SIMD isn't currently and in the near future relevant for regular general purpose processors.

But for smaller more specialized CPUs in embedded or automotive usecases you can get more parallel compute, while keeping the software model simpler than having to dispatch to a GPU.

Specifically a design like https://saturn-vectors.org/#_short_vector_execution, which like to use 2x or 4x wider vectors that the datapath length for more efficient chaining. I quite like that design, because you can get high utilization and limited out-of-order execution without vector register renaming.

link

camel-cdr 25 days ago

In GPUs GLSL like types compile down to what basically is variable length SIMD. A vec4 doesn't get compiled to a SIMD vector with four floats, but rather to four SIMD vectors, each containing N FP32 elements (usually 32 or 64).

Look at what this simple shader compiles down to on RGA: https://godbolt.org/z/4GrfY61vf

link

whizzter 24 days ago

Right, and AVX512 would thus be more relevant if ISPC-like features was mainstreamed in CPU bound C++ compilers.

link