|
|
|
|
|
by nkurz
4331 days ago
|
|
'icpc' (the Intel C++ compiler) has equal performance for both of the test cases, and it did choose to use different registers for each call. But it's not clear if that's by design or by chance. In some ways, that's the boring part. The interesting part (to me) is that both tests are much faster than either version with g++. Here's icpc 14.0.3 vs g++ 4.8.1 on a Sandy Bridge E5-1620 @ 3.60GHz and a Haswell i7-4770 CPU @ 3.40GHz. nate@sandybridge:~/tmp$ g++ -O3 -march=native -std=c++11 popcnt-dependency.cpp -o popcnt-dependency
nate@sandybridge:~/tmp$ popcnt-dependency 1
unsigned 41959360000 0.608615 sec 17.2289 GB/s
uint64_t 41959360000 0.82312 sec 12.739 GB/s
nate@sandybridge:~/tmp$ icpc -O3 -march=native -std=c++11 popcnt-dependency.cpp -o popcnt-dependency
nate@sandybridge:~/tmp$ popcnt-dependency 1
unsigned 41959360000 0.182781 sec 57.3679 GB/s
uint64_t 41959360000 0.182638 sec 57.4128 GB/s
nate@haswell:~/tmp$ g++ -O3 -march=native -std=c++11 popcnt-dependency.cpp -o popcnt-dependency
nate@haswell:~/tmp$ popcnt-dependency 1
unsigned 41959360000 0.401225 sec 26.1343 GB/s
uint64_t 41959360000 0.75841 sec 13.826 GB/s
nate@haswell:~/tmp$ icpc -O3 -march=native -std=c++11 popcnt-dependency.cpp -o popcnt-dependency
nate@haswell:~/tmp$ popcnt-dependency 1
unsigned 41959360000 0.0843861 sec 124.259 GB/s
uint64_t 41959360000 0.0842836 sec 124.41 GB/s
That would be incredible if true! But I think it's a bug, since the inner loop looks far too short and doesn't seem to be repeating the popcnt's. I'm not sure yet if it's a problem with the compiler or if the test case is abusing something undefined. |
|