|
|
|
|
|
by dragontamer
1788 days ago
|
|
Autovectorization has issues with function calls. "#pragma omp declare simd" applies over a function call, which then allows that function to be used inside of a "#pragma omp for simd" loop. A few keywords here and there really help the autovectorizer achieve closer to CUDA-like environments (like... actually having your SIMD code extend "through" a function call, so you can start splitting up the work a bit better). EDIT: Here's an example from Intel's ICC: https://software.intel.com/content/www/us/en/develop/documen... |
|