|
|
|
|
|
by mhkool
1593 days ago
|
|
Since the performance for array sizes <L1-size and <L2-size is similar , I would like to see an attempt to improve B.
B = L2-size / 2 / sizeof(int) - 16 might produce better results. Note also that _mm_broadcast_ss() is faster on newer processors. |
|