| HN Mirror

OK, I was able to test this more rigorously. Yes, the vast majority of the difference between Intel and GCC is just the better implementations of exp() and log(). These are not vector implementations that calculate multiple results simulataneously, just scalar implementations that make better use of the available instruction set.

When I can get g++ to use Intel's libimf instead of glibc's libm, it's only ~5% slower than icpc. I'd guess much of the remaining difference is function call overhead vs inlining. Clang still lags by another 10%, but I think that's probably because I haven't figured out quite the right incantation to get it to use libimf without also disabling some other useful optimization.

Here are the commands that I ended up with:

clang++ -fno-finite-math-only -march=native -Wall -Wextra -g -O3 option.cc -o option -Wl,-rpath=/opt/intel/compilers_and_libraries/linux/lib/intel64 -L/opt/intel/compilers_and_libraries/linux/lib/intel64 -limf -lintlc

g++ -fno-finite-math-only -march=native -Wall -Wextra -g -Ofast option.cc -o option -Wl,-rpath=/opt/intel/compilers_and_libraries/linux/lib/intel64 -L/opt/intel/compilers_and_libraries/linux/lib/intel64 -limf -lintlc

icpc -march=native -Wall -Wextra -g -Ofast option.cc -o option

The "-fno-finite-math-only" disables an otherwise good clang++ and g++ optimization that requires a __finite_exp() function that libimf does not have. clang++ seems to also need to switch from -Ofast to -O3 to make this stick.

What I don't know yet is whether recompiling glibc (or upgrading to the most recent glibc) will produce better performance out-of-the-box on recent Intel. My searches aren't turning up much information --- anyone know?

gcc and clang will vectorize that, but not exp / log

I don't know how well it's currently working, but it looks like this might have changed recently: https://sourceware.org/glibc/wiki/libmvec