|
|
|
|
|
by nkurz
4704 days ago
|
|
On second thought, we can possibly do something similar on Intel using 2 pshufb and a blend. I tried for a bit, but haven't figured out how to make that work. PSHUFB needs a different XMM operand for each 'rotate'. Loading this operand would take 6 cycles, and I haven't thought of a clever way of generating it in less. I do greatly appreciate your help, though. Thanks! |
|