|
|
|
|
|
by nkurz
4338 days ago
|
|
OK, it looks like 'icpc' has decided that it would be fastest to invert the two loops: popcnt() once, then repeat the addition 10000 times. I'm neither a language lawyer nor a friend of C++, so I'll refrain to trying to decide whether this is a legal optimization. But a liberal sprinkling of 'volatile' makes it do what was obviously intended. After this, the speeds are more comparable, although 'icpc' retains a small (but much more plausible) lead: nate@sandybridge:~/tmp$ popcnt-dependency 1
unsigned 41959360000 0.517827 sec 20.2495 GB/s
uint64_t 41959360000 0.518041 sec 20.2412 GB/s
nate@haswell:~/tmp$ popcnt-dependency 1
unsigned 41959360000 0.351273 sec 29.8507 GB/s
uint64_t 41959360000 0.352914 sec 29.712 GB/s
The other test I did was checking what Intel's IACA (a wonderful optimization tool that you really should be using if you are not already) thought about the g++ loop. It did _not_ notice the false dependency, and said the loops should take the same amount of time. Do this suggest that the Intel compiler is just getting lucky, or that Intel doesn't have great internal communication between teams? |
|