|
|
|
|
|
by dragontamer
1481 days ago
|
|
Unroll the dependency until you are longer than the SIMD width. Ex: as long as i, i+1, i+2, i+3, ... i+7 are not dependent on each other, you can vectorize to SIMD-width 8. Or in other words: i+7 can depend on i-1 no problems. |
|
> Ex: as long as i, i+1, i+2, i+3, ... i+7 are not dependent on each other, you can vectorize to SIMD-width 8.
Do you mean like this? I get this to about as fast as the first "unoptimized" version in the SO post, but not faster.