|
|
|
|
|
by bigcheesegs
2226 days ago
|
|
> Opening the binary with Binary Ninja revealed that clang had already managed to leverage the SSE registers. X86-64 uses SSE registers for all floating point operations. I'm not sure that the author realized that they were looking at an -O0 binary. -O0 does not do vectorization (or anything else for that matter). |
|
mulss: multiplication of a single single-precision floating point value.
mulsd: multiplication of a single double-precision floating point value.
mulps: multiplication of a packed group of single-precision floating point values.
mulpd: multiplication of a packed group of double-precision floating point values.
If you're mostly seeing -ps suffixes only on moves and shuffles, you're looking at code that is not being vectorized. (And, actually, if you're seeing a lot of shuffles, that's also a good sign its not well-vectorized.)
Incidentally, if you're seeing unexpected -sd suffixes, those are often due to unintended conversions between float and double. They can have a noticeable effect on performance, especially if you end up calling the double versions of math functions (as they often use iterative algorithms that need more iterations to achieve double-precision).
I'm linking GCC output, because it's simpler to follow, but you see more or less the same struggle with Clang.
https://godbolt.org/z/XtVqsU