Hacker News new | ask | show | jobs
by Validark 27 days ago
If we want to improve cross-platform SIMD, in my opinion we should start by supporting more operations in LLVM IR. Like vector expansion (currently we only have expandload), runtime-known shuffle vectors, pdep/pext operations.

Also, let's stop with the "vector length agnostic" types being the sole option for SVE extensions. I'd rather write an optimized routine for a 16-byte machine I'm targeting and be able to upgrade it in 5 years than have "agnostic" code that wants to pretend like it would work amazingly on all platforms, but the machine I optimized it for is theoretical. I'm fine with recompiling my code, I do it every day. If I have an algorithm that's truly vector length agnostic, I can make the vector length a constant in my code that can change based on the compile target.

https://github.com/llvm/llvm-project/issues/113422

https://github.com/llvm/llvm-project/issues/172857

1 comments

> Also, let's stop with the "vector length agnostic" types being the sole option for SVE extensions

They aren't, see the `arm_sve_vector_bits` attribute.

> I'm fine with recompiling my code, I do it every day

Then you can do that.

> If I have an algorithm that's truly vector length agnostic, I can make the vector length a constant in my code that can change based on the compile target.

You can do that, but why not simply write it in a vector-length-agnostic way?

IMO the better approach is to start thinking about SIMD optimizations in a VLA way, and specialize on the vector length, when that becomes advantageous. Doing it this way is better even if you end up not writing VLA code, because you though about the scalability problem.

Many libraries currently don't scale beyond 128-bit, not because they couldn't make efficient use of >128-bit, but because the library was architect around 128-bit and changing that amounts to almost a full rewrite. So now you are stuck wasting 3/4th of your ALUs running 128-bit SSE on Zen5.