Hacker News new | ask | show | jobs
by nkurz 4704 days ago
On second thought, we can possibly do something similar on Intel using 2 pshufb and a blend.

I tried for a bit, but haven't figured out how to make that work. PSHUFB needs a different XMM operand for each 'rotate'. Loading this operand would take 6 cycles, and I haven't thought of a clever way of generating it in less.

I do greatly appreciate your help, though. Thanks!