|
|
|
|
|
by dfbrown
3540 days ago
|
|
I would try unrolling 2-4 iterations of the loop. Multiple sequential loads isn't much slower than a single load, so batching your loads and stores together will let you do more arithmetic operations for each time you hit memory. |
|