|
|
|
|
|
by yoklov
4110 days ago
|
|
The reason to use intrinsics and inline assembly (actually, the latter is pretty rare these days, intrinsics being much more common) isn't only about beating the compiler. When you're relying on the compiler to vectorize, you run the risk of a subtle, innocuous change to the code breaking the vectorization -- and this will happen a lot. Also, when you target multiple compilers, it's very difficult to get reliable performance across all of them, unless you do the vectorizing yourself. Not to mention, compilers tend to do great on simple test cases like these, but totally barf as soon as the loop becomes more complex (Try adding some conditionals to the loops some time... It's not that these loops can't be vectorized, it's just that the compiler doesn't know how). To get the best performance out of vectorization, it's mostly about organizing the data so that it can be easily vectorized. If you've gone through this work, it's fairly pointless not to take the extra effort to guarantee that you're getting the performance you expect. |
|