Hacker News new | ask | show | jobs
by amluto 206 days ago
> C++ maps almost perfectly to the hardware with minimal overhead

Barely.

The C++ aliasing rules map quite poorly into hardware. C++ barely helps at all with writing correct multithreaded code, and almost all non-tiny machines have multiple CPUs. C++ cannot cleanly express the kinds of floating point semantics that are associative, and SIMD optimization care about this. C++ exceptions have huge overhead when actually thrown.

4 comments

> C++ exceptions have huge overhead when actually thrown

Which is why exceptions should never really be used for control flow. In our code, an exception basically means "the program is closing imminently, you should probably clean up and leave things in a sensible state if needed."

Agree with everything else mostly. C/C++ being a "thin layer on top of hardware" was sort of true 20? 30? years ago.

In simulations or in game dev, the practice is to use an SoA data layout to avoid aliasing entirely. Job systems or actors are used for handling multithreading. In machine learning, most parallelism is achieved through GPU offloading or CPU intrinsics. I agree in principle with everything you’re saying, but that doesn’t mean the ecosystem isn’t creative when it comes to working around these hiccups.
> The C++ aliasing rules map quite poorly into hardware.

But how much does aliasing matter on modern hardware? I know you're aware of Linus' position on this, I personally find it very compelling :)

As a silly little test a few months ago, I built whole Linux systems with -fno-strict-aliasing in CFLAGS, everything I've tried on it is within 1% of the original performance.

Even with strict aliasing, C and C++ often have to assume aliasing when none exists.
If they somehow magically didn't, how much could be gained?

I've never seen an attempt to answer that question. Maybe it's unanswerable in practice. But the examples of aliasing optimizations always seem to be eliminating a load, which in my experience is not an especially impactful thing in the average userspace widget written in C++.

The closest example of a more sophisticated aliasing optimization I've seen is example 18 in this paper: https://dl.acm.org/doi/pdf/10.1145/3735592

...but that specific example with a pointer passed to a function seems analogous to what is possible with 'restrict' in C. Maybe I misunderstood it.

This is an interesting viewpoint, but is unfortunately light on details: https://lobste.rs/s/yubalv/pointers_are_complicated_ii_we_ne...

Don't get me wrong, I'm not saying aliasing is a big conspiracy :) But it seems to have one of the higher hype-to-reality disconnects for compiler optimizations, in my limited experience.

Back in 2015 when the Rust project first had to disable use of LLVM's `noalias` they found that performance dropped by up to 5% (depending on the program). The big caveat here is that it was miscompiling, so some of that apparent performance could have been incorrect.

Of course, that was also 10 years ago, so things may be different now. There'll have been interest from the Rust project for improving the optimisations `noalias` performs, as well as improvements from Clang to improve optimisations under C and C++'s aliasing model.

Thanks! I've heard a lot of anecdotes like this, but I've never found anyone presenting anything I can repeoduce myself.
Strict aliasing is not the only kind of aliasing.
Yes, that's why I described it as "silly" :)

Is there a better way to test the contribution of aliasing optimizations? Obviously the compiler could be patched, but that sort of invalidates the test because you'd have to assume I didn't screw up patching it somehow.

What I'm specifically interested in is how much more or less of a difference the class of optimizations makes on different calibers of hardware.

Well, the issue is that "aliasing optimizations" means different things in different languages, because what you can and cannot do is semantically different. The argument against strict aliasing in C is that you give up a lot and don't get much, but that doesn't apply to Rust, which has a different model and uses these optimizations much more.

For Rust, you'd have to patch the compiler, as they don't generally provide options to tweak this sort of thing. For both rust and C this should be pretty easy to patch, as you'd just disable the production of the noalias attribute when going to LLVM; gcc instead of clang may be harder, I don't know how things work over there.

Thanks!
CUDA hardware is specially designed against C++ memory model.

It wasn't initially, and then NVIDIA went through a multi-year effort to redesign the hardware.

If you're curious, there are two CppCon talks on the matter.