Hacker News new | ask | show | jobs
by stormbrew 1478 days ago
I think the point GP is making is that even without vectorization, the data dependency causes stalls even in normal, single data instructions. That is, a data dependency between iterations of loops will hurt performance even for non-vectorizable calculations (or on CPUs with high ILP but no really good vector instructions, which granted is probably a small pool since both those things came about at about the same time).
2 comments

I think that is the point Peter Cordes is trying to make (quite politely but firmly) over there on stackoverflow. It's not only about autovectorization. The main point there is the loop-carried dependency that prevents both the compiler (autovec) and the processor (ILP) to do their thing.

Loop-carried dependency is the big culprit here. I wish we had a culture of writing for loops with the index a constant inside the loop, as in the Ada for statement, and not the clever C while loops or for with two running variables... Simpler loop syntax makes so many static analyses 'easier' and kind of forces the brain to think in bounded independant steps, or to reach for higher level constructs (e.g. reduce).

I wonder if it becomes a math problem to optimize then, like Euler solving the sum of 1-100 by adding 1 to 100 and multiplying by the 50 pairs of numbers that operation created to get 5050?