|
|
|
|
|
by nkurz
4575 days ago
|
|
Normally I'd use restrict and float pointers, but since I was trying to repeat what the original poster did, I used fixed arrays instead. Because of this, I did not see a difference with 'restrict'. But I might be missing something, or might have messed up with the array indexing. The generated GCC optimized function is 500 instructions long, and thus difficult to scan. I put my untested test code up here: http://pastebin.com/qB0DfkXN |
|
Without it the code of sum_of_squares_1 is as following:
As you can see it stores the dst[y] on each iteration. With function definition of: void sum_of_squares_1(float dst[restrict ROWS], float src[restrict ROWS][COLS]) The disassembly becomes completely different. However the speed of the end result did not really change that much.Could you throw objdump -d of the best icc output to pastebin? I'm interested to see what kind of code it produces.