|
|
|
|
|
by hansvm
1477 days ago
|
|
It's a different issue altogether. Even in vectorized code artificial data dependencies are an issue. E.g., for fast dot products (not actually usually applicable on most input sizes because of memory bandwidth, but related problems can be CPU bound) you'll roughly double your execution speed by alternating between two running totals and merging those at the end, and that gain is mostly orthogonal to the question of whether those running totals are vectorized. Edit: And many modern compilers have been doing that particular optimization for a few years anyway, but it's still an important idea to keep in mind for any non-trivial graph of operations. |
|