|
|
|
|
|
by dreamcompiler
1478 days ago
|
|
The other issue is instruction-level parallelism, as another poster in TFA pointed out. Even within a single loop iteration the "unoptimized" code is more likely to exploit multiple ALUs if they exist, regardless of vectorization instructions. |
|