Hacker News new | ask | show | jobs
by yuushi 3029 days ago
> SIMD

Intrinsics are directly callable from C++.

> likely/unlikely branches

Most compilers have extensions that will allow you to do this (__builtin_expect and so on).

> in-lining can't be forced when you know it gives better performance

Again, most compilers have this, not just GCC, e.g. __forceinline.

> the compiler has a lot of trouble knowing when lines of code are independent and can be done in parallel (b/c const =/= immutable)

This is true, as aliasing is a real issue. The hardware itself has some say over this anyway, dependent on its instruction scheduling and OOE capabilities.

What you don't mention, however, is the fact that almost no other languages offer any of these, let alone all of them. Rust may be the exception here, although some of this is still in the words (SIMD, I'm not sure about the status of likely/unlikely intrinsics).

For GPU programming, if you're using CUDA, you're almost certainly using C or C++, or calling something that wraps C/C++ code. Not everything is suited to GPU processing anyway, there's still a lot of code that's not moving off the CPU any time soon that needs to be performant.

1 comments

right, so things that are not part of the language, not crossplatform and not crosscompiler. That's called fighting the language in my book :)

I'm not saying you can't get C++ to output the assembly you want - it just sucks trying to coerce it to do things that are honestly not that complicated. And even when you do get what you want you find you can't use the code anywhere else. To me that feels like a language failure...

> is the fact that almost no other languages offer any of these

I guess you missed my point. It seems to me that we're at a point where you no longer need these features as part of your core application language. The idea is that with OpenCL/SPIR-V we'll be able to

1- be more explicit and not fight the language (so even if you're 100% on the CPU it makes sense)

2- target every platform (you can finally write code for your GPU)

3- can be called from any parent language

You're right that not all performance critical problems boil down to tight shared-memory loops that can be thrown onto an OpenCL kernel - but my experience so far tells me that that's the vast majority of performance problems. So C++'s usefulness will shrink. But maybe my experience is biased and I'm off base. I haven't done much OpenCL myself - but I'm definitely planning to use it more in the future

> right, so things that are not part of the language, not crossplatform and not crosscompiler

You just have a header with different #defines for the different platforms you are going to ship on, or use a premade open source one.

If you want to ship on everything, you won't get full optimization stuff everywhere. It would be better if some of these features were in the standard, but in practice it isn't such a big issue for those two in particular.