Hacker News new | ask | show | jobs
by pbsd 696 days ago
AVX-512 allows arbitrary shuffles, e.g., shuffle the 64 bytes in zmm0 with indices from zmm1 into zmm2. Simple shuffles like unpacks etc aren't really an issue.
1 comments

Worse yet (for wiring complexity or required uops, anyway), AVX-512 also has shuffles with two data inputs, i.e. each of the 64 bytes of result can come from any of 128 different input bytes, selected by another 64-byte register.
Which is also why it's so attractive. :)

Those large shuffles are really powerful for things like lookup tables. Large tables are suddenly way more feasible in-register, letting you replace a costly gather with an in-register permute.