Hacker News new | ask | show | jobs
by haberman 1076 days ago
> The point to me is that using memcpy instead of pointer casts is NOT an improvement.

The improvement comes when there are multiple accesses that could potentially point to the same memory. Consider a silly function:

    void f(int16_t* a, int32_t* b) {
      for (int32_t i = 0; i < 100; i++) {
        b[i] = a[0] + i;
      }
    }
If type-based alias analysis is enabled, then the compiler can assume that a[0] does not alias b[i] because they are different pointer types. So it can hoist the load of a[0] outside the loop, improving efficiency. If strict aliasing is disabled, it cannot assume this, so it must reload a[0] each time: https://godbolt.org/z/E7jxfYsbx

The memcpy() makes it clear that the memory could alias anything, so it will generate the less efficient code even if strict aliasing is enabled: https://godbolt.org/z/KoPxK9fPj

Memory aliasing is a huge thorn in the side of the optimizer, because the compiler frequently has to allow for the possibility that different pointers will alias each other, even if they never will in practice. The code might end up being slower than necessary for no real reason. Strict aliasing is one of the few tools we have to tell the compiler that aliasing will not occur.

I don't think that C actually forbids this code:

     *(int*)0x12345678
The rule is just: if you access it as an int, you have to consistently access as an int. You can't mix types from one access to the next, eg:

    *(long*)0x12345678
    *(int*)0x12345678
2 comments

> Strict aliasing is one of the few tools we have to tell the compiler that aliasing will not occur.

I can see the argument, but there's a much better way to indicate what you want with your example:

    void f(int16_t* a, int32_t* b) {
      const int16_t a0 = a[0];
      for (int32_t i = 0; i < 100; i++) {
        b[i] = a0 + i;
      }
    }
Now a clean (well defined) compiler could do what you asked.

I've seen other people suggest that UB is a mechanism to have these magical backdoor conversations with the compiler to express optimization opportunities. I think that's absurd and reckless. Propose adding assertions or "declare" statements instead, and quit thinking of interpretive dance through a minefield as a method of communication.

You are entitled to your opinion. C isn't perfect, but as someone who spends my life trying to optimize the efficiency and code size of critical loops to the max, I like the direction C has gone with UB and optimizations. It's not the right tool for every problem, but for the most size/speed critical code it's hard to beat IMO.
> I don't think that C actually forbids this code:

     *(int*)0x12345678
If not, give it time. It was only a few years ago when you were allowed to use a union for that kind of thing. I really believe they'll eventually make everything except unsigned integers be UB.

"Oh, the code was never correct. You just got lucky before."