|
|
|
|
|
by pcordes
3683 days ago
|
|
That's correct; I'm sure icc has vectorized versions of those functions to inline. Each monte-carlo iteration is independent, so the code vectorizes very easily if you have a vectorized RNG and vectorized exp() and log(). (vector sqrt() is available in hardware, so even gcc and clang will vectorize that, but not exp / log even with -ffast-math) |
|
When I can get g++ to use Intel's libimf instead of glibc's libm, it's only ~5% slower than icpc. I'd guess much of the remaining difference is function call overhead vs inlining. Clang still lags by another 10%, but I think that's probably because I haven't figured out quite the right incantation to get it to use libimf without also disabling some other useful optimization.
Here are the commands that I ended up with:
clang++ -fno-finite-math-only -march=native -Wall -Wextra -g -O3 option.cc -o option -Wl,-rpath=/opt/intel/compilers_and_libraries/linux/lib/intel64 -L/opt/intel/compilers_and_libraries/linux/lib/intel64 -limf -lintlc
g++ -fno-finite-math-only -march=native -Wall -Wextra -g -Ofast option.cc -o option -Wl,-rpath=/opt/intel/compilers_and_libraries/linux/lib/intel64 -L/opt/intel/compilers_and_libraries/linux/lib/intel64 -limf -lintlc
icpc -march=native -Wall -Wextra -g -Ofast option.cc -o option
The "-fno-finite-math-only" disables an otherwise good clang++ and g++ optimization that requires a __finite_exp() function that libimf does not have. clang++ seems to also need to switch from -Ofast to -O3 to make this stick.
What I don't know yet is whether recompiling glibc (or upgrading to the most recent glibc) will produce better performance out-of-the-box on recent Intel. My searches aren't turning up much information --- anyone know?
gcc and clang will vectorize that, but not exp / log
I don't know how well it's currently working, but it looks like this might have changed recently: https://sourceware.org/glibc/wiki/libmvec