|
|
|
|
|
by dragontamer
1773 days ago
|
|
I'm not convinced that the "4-vector" is a very useful C++ concept. Sure, it easily maps to 4-wide SIMD registers, but is that really what you want? Its clear to anyone who has tried it... that NVidia's CUDA / OpenCL / Intel ISPC approach is superior. Seeing the SIMD-lanes as a thread is easier to understand than expected. NVidia CUDA and AMD ROCm/HIP are your C++ languages that compile into SIMD code. OpenCL isn't really C++ but kinda is associated with it. Intel is doing the OneAPI thing but I don't know much about it yet. Python, Julia, and other high-level languages are also moving into the "simd-lanes as threads" approach. Its just fundamentally easier to think about. |
|