|
|
|
|
|
by phkahler
1773 days ago
|
|
My biggest complaint is that SIMD opens the door to doing proper vector math (dot product, cross product, vector addition, etc) in a single instruction. I'd like to see languages like C++ and Rust adopt 2,3 and 4-element vectors as first class types and use these registers and instructions for them. I don't want to use the language features to define my onw vector types and figure out how to minimize any overhead in the implementation. I tried that years ago in C++ and ended up not using some of my overloaded operators because they were too slow. Some of that can be overcome with C++11 (I think) features, or others. Point is, vector math is so common I think they deserve first class support in languages. |
|
Its clear to anyone who has tried it... that NVidia's CUDA / OpenCL / Intel ISPC approach is superior. Seeing the SIMD-lanes as a thread is easier to understand than expected.
NVidia CUDA and AMD ROCm/HIP are your C++ languages that compile into SIMD code. OpenCL isn't really C++ but kinda is associated with it. Intel is doing the OneAPI thing but I don't know much about it yet.
Python, Julia, and other high-level languages are also moving into the "simd-lanes as threads" approach. Its just fundamentally easier to think about.