Hacker News new | ask | show | jobs
by less_less 1086 days ago
Yeah, those vector permute instructions are super useful for both patterns. There are dedicated instructions for some specific permutations (shifting over by a constant number of bytes, and some interleavings) but you can easily end up needing the general case. And of course parallel LUTs are also very useful. Depending on what you're doing you could easily end up with both in the same algorithm.