|
|
|
|
|
by stormbrew
1478 days ago
|
|
I think the point GP is making is that even without vectorization, the data dependency causes stalls even in normal, single data instructions. That is, a data dependency between iterations of loops will hurt performance even for non-vectorizable calculations (or on CPUs with high ILP but no really good vector instructions, which granted is probably a small pool since both those things came about at about the same time). |
|
Loop-carried dependency is the big culprit here. I wish we had a culture of writing for loops with the index a constant inside the loop, as in the Ada for statement, and not the clever C while loops or for with two running variables... Simpler loop syntax makes so many static analyses 'easier' and kind of forces the brain to think in bounded independant steps, or to reach for higher level constructs (e.g. reduce).