Hacker News new | ask | show | jobs
by pbsd 4706 days ago
I have no idea how to simulate a variable PALIGNR on Intel chips without making the loop extremely slow.

On AMD (with XOP), it can be done using VPPERM, which can shuffle from 2 sources. We can do variable alignment like this:

  vpperm xmm0, xmm1, xmm2, [[0..31] + offset]
On second thought, we can possibly do something similar on Intel using 2 pshufb and a blend.
1 comments

On second thought, we can possibly do something similar on Intel using 2 pshufb and a blend.

I tried for a bit, but haven't figured out how to make that work. PSHUFB needs a different XMM operand for each 'rotate'. Loading this operand would take 6 cycles, and I haven't thought of a clever way of generating it in less.

I do greatly appreciate your help, though. Thanks!