Hacker News new | ask | show | jobs
by mlochbaum 25 days ago
This is one of the best uses I've found for Singeli[0]. Here's how I implemented an AVX2 transpose kernel similar to that in transpose_Vec256_kernel, for generic type and vector/kernel size (I use unpack instructions rather than shuffle and blend, which I think is probably faster since it's just one instruction for each interaction):

https://github.com/dzaima/CBQN/blob/v0.11.0/src/singeli/src/...

The language is oriented towards compile-time array programming instead of managing a bunch of individual vectors. So you have runtime vec_select{} (docs at [1]), mirrored by compile-time select{}, and the indices generated by pairs{} can be used in either.

[0] https://github.com/mlochbaum/Singeli/

[1] https://github.com/mlochbaum/Singeli/tree/master/include#sim...