|
|
|
|
|
by sharpneli
4575 days ago
|
|
Did you use restrict? I made a simple test
void nsum(float v, float acc, int n, int vc )
{
int j, i;
for(i = 0; i < n; i++)
for(j = 0; j < vc; j++)
acc[i] += v[j][i]v[j][i];
} And then I tested the same function with a different declaration
void nsum(float * restrict * v, float * restrict acc, int n, int vc ) The version without restrict qualifier had 1.01s runtime. Version with restrict had 0.45s runtime. Both were compiled with identical flags (just -O3) using the ancient gcc 4.4.5. (vectorizer is enabled by default at O3 even in this version). That's 2x speedup with a simple pointer definition. |
|