Hacker News new | ask | show | jobs
by dragontamer 1788 days ago
Autovectorization has issues with function calls.

"#pragma omp declare simd" applies over a function call, which then allows that function to be used inside of a "#pragma omp for simd" loop.

A few keywords here and there really help the autovectorizer achieve closer to CUDA-like environments (like... actually having your SIMD code extend "through" a function call, so you can start splitting up the work a bit better).

EDIT: Here's an example from Intel's ICC: https://software.intel.com/content/www/us/en/develop/documen...

1 comments

I took the example program from the OpenMP standard and built it with GCC 11 -Ofast. -fopt-info said the relevant loop was vectorized. Adding -fopenmp gave more vectorization messages from elsewhere, but I don't have time to figure out the difference from the tree dump (not being good with assembler). Doubtless the directives can help, but you do need to get them right, and I trust GCC more than me!