|
|
|
|
|
by lukefleed
216 days ago
|
|
Thanks! I used perf to look at cache miss rates and memory bandwidth during runs. The measurements showed the pattern I expected, but I didn't do a rigorous profiling study (different cache sizes, controlled benchmarks across architectures, or proper statistical analysis). This was for a university exam, and I ran out of time to do it properly. The cache argument makes intuitive sense (three vectors cycling vs. scanning a growing n×k matrix), and the timing data supports it, but I'd want to instrument it more carefully in the future :) |
|