| HN Mirror

nate@sandybridge:~/tmp$ g++ -O3 -march=native -std=c++11 popcnt-dependency.cpp -o popcnt-dependency nate@sandybridge:~/tmp$ popcnt-dependency 1 unsigned 41959360000 0.608615 sec 17.2289 GB/s uint64_t 41959360000 0.82312 sec 12.739 GB/s nate@sandybridge:~/tmp$ icpc -O3 -march=native -std=c++11 popcnt-dependency.cpp -o popcnt-dependency nate@sandybridge:~/tmp$ popcnt-dependency 1 unsigned 41959360000 0.182781 sec 57.3679 GB/s uint64_t 41959360000 0.182638 sec 57.4128 GB/s nate@haswell:~/tmp$ g++ -O3 -march=native -std=c++11 popcnt-dependency.cpp -o popcnt-dependency nate@haswell:~/tmp$ popcnt-dependency 1 unsigned 41959360000 0.401225 sec 26.1343 GB/s uint64_t 41959360000 0.75841 sec 13.826 GB/s nate@haswell:~/tmp$ icpc -O3 -march=native -std=c++11 popcnt-dependency.cpp -o popcnt-dependency nate@haswell:~/tmp$ popcnt-dependency 1 unsigned 41959360000 0.0843861 sec 124.259 GB/s uint64_t 41959360000 0.0842836 sec 124.41 GB/s

OK, it looks like 'icpc' has decided that it would be fastest to invert the two loops: popcnt() once, then repeat the addition 10000 times. I'm neither a language lawyer nor a friend of C++, so I'll refrain to trying to decide whether this is a legal optimization. But a liberal sprinkling of 'volatile' makes it do what was obviously intended. After this, the speeds are more comparable, although 'icpc' retains a small (but much more plausible) lead:

  nate@sandybridge:~/tmp$ popcnt-dependency 1
  unsigned	41959360000	0.517827 sec 	20.2495 GB/s
  uint64_t	41959360000	0.518041 sec 	20.2412 GB/s


  nate@haswell:~/tmp$ popcnt-dependency 1
  unsigned	41959360000	0.351273 sec 	29.8507 GB/s
  uint64_t	41959360000	0.352914 sec 	29.712 GB/s

The other test I did was checking what Intel's IACA (a wonderful optimization tool that you really should be using if you are not already) thought about the g++ loop. It did _not_ notice the false dependency, and said the loops should take the same amount of time. Do this suggest that the Intel compiler is just getting lucky, or that Intel doesn't have great internal communication between teams?