|
|
|
|
|
by paulsutter
3490 days ago
|
|
Yes a compiler can generate the instruction. But if it's alone in a for loop surrounded by random STL classes - which even if inlined - are bodging up the pipeline or (gasp) causing spurious random dram accesses, there's little performance gain. And that's what usually happens in c++ code that wasn't already designed for AVX ("it's using AVX, but it's not running any faster. i guess AVX doesn't make much difference"). Net-net, data and code need to be structured for AVX to achieve the potential performance gains, and that's 80% of the work. Once you structure the data and code for AVX, yes you can use regular C statements, then experiment with optimization flags until the compiler generates the intended instructions (and hasn't introduced excessive register spills). But its hard to see how that's any easier than using the intrinsics. |
|
You are correct that, generally speaking, most STL heavy code would be hard to vectorize and unlikely to gain much advantage. (Plus there are the valarray misadventures). You will sometimes see clang and gcc vectorize std::vector if the code is simple enough, and they can assume strict aliasing. Intel's compiler has historically been less aggressive about assuming strict aliasing.
Various proposals are working through the standard committee to add explicit support for SIMD programming. E.g. if something like http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n418... were to be standardized we could write matrix multiply explicitly as:
For my own work on vector languages and compilers I've had an easier time of it since they have been designed to enable simpler SIMD code generation.