|
|
|
|
|
by TheLoneWolfling
4018 days ago
|
|
What's wrong with "Unpack and interleave high-order quadwords from xmm1 and xmm2/m128 into xmm1." and "Multiply packed single-precision floating-point val-ues from ymm1 and ymm2/mem, negate the multi-plication result and add to ymm0 and put result in ymm0."? It's simple, right? (This is sarcasm, by the way.) Although I will point out that C has its esoterics also. |
|
Unfortunately, compilers aren't indeed very good at picking optimal instructions when it could take good advantage of instructions such as these. No wonder, though.
Packing and unpacking are often needed in SIMD context. There are a lot of such instructions, including shuffles and permutes. Individually they may sound esoteric, but actually cover a nice number of real life data shuffling needs and are extremely fast.
Fused-multiply-add instructions can double effective FLOPS.