| > SIMD Intrinsics are directly callable from C++. > likely/unlikely branches Most compilers have extensions that will allow you to do this (__builtin_expect and so on). > in-lining can't be forced when you know it gives better performance Again, most compilers have this, not just GCC, e.g. __forceinline. > the compiler has a lot of trouble knowing when lines of code are independent and can be done in parallel (b/c const =/= immutable) This is true, as aliasing is a real issue. The hardware itself has some say over this anyway, dependent on its instruction scheduling and OOE capabilities. What you don't mention, however, is the fact that almost no other languages offer any of these, let alone all of them. Rust may be the exception here, although some of this is still in the words (SIMD, I'm not sure about the status of likely/unlikely intrinsics). For GPU programming, if you're using CUDA, you're almost certainly using C or C++, or calling something that wraps C/C++ code. Not everything is suited to GPU processing anyway, there's still a lot of code that's not moving off the CPU any time soon that needs to be performant. |
I'm not saying you can't get C++ to output the assembly you want - it just sucks trying to coerce it to do things that are honestly not that complicated. And even when you do get what you want you find you can't use the code anywhere else. To me that feels like a language failure...
> is the fact that almost no other languages offer any of these
I guess you missed my point. It seems to me that we're at a point where you no longer need these features as part of your core application language. The idea is that with OpenCL/SPIR-V we'll be able to
1- be more explicit and not fight the language (so even if you're 100% on the CPU it makes sense)
2- target every platform (you can finally write code for your GPU)
3- can be called from any parent language
You're right that not all performance critical problems boil down to tight shared-memory loops that can be thrown onto an OpenCL kernel - but my experience so far tells me that that's the vast majority of performance problems. So C++'s usefulness will shrink. But maybe my experience is biased and I'm off base. I haven't done much OpenCL myself - but I'm definitely planning to use it more in the future