|
|
|
|
|
by less_less
1086 days ago
|
|
Yeah, those vector permute instructions are super useful for both patterns. There are dedicated instructions for some specific permutations (shifting over by a constant number of bytes, and some interleavings) but you can easily end up needing the general case. And of course parallel LUTs are also very useful. Depending on what you're doing you could easily end up with both in the same algorithm. |
|