Hacker News new | ask | show | jobs
by orlp 929 days ago
That's actually fair enough. I'm not certain how well the CPU will like the overlapping SIMD reads/writes on the left hand side.

If you have AVX512 available shuffle(xs, lut[movemask(compare_result)]) could also be two compressions with one reverse shuffle in between.

1 comments

I don't know either! I've never done something like this. Usually I am filtering to the same buffer or one or two other buffers, so I would never immediately load some stuff I had just stored.