Hacker News new | ask | show | jobs
by jstimpfle 716 days ago
In your godbolt

  extern volatile int x;
  int ub() {
      int r = x;
      r += 1/0;
      return r;
  }
and the output is

  ub:
        mov     eax, DWORD PTR x[rip]
        ud2
I don't see what is the side effect that you say is removed here?

As for the earlier example (hoisting the division out of the loop), I was going to write a wall of text explaining why I find the behaviour totally intuitive and in line with what I'd expect.

But we can make it simpler: The code doesn't even have any observeable side effect (at least I think so), because it only reads the volatile, never writes it! The observeable behaviour is exactly the same as if the hoist hadn't happened. I believe it's a totally valid transformation, at least I don't have any concerns with it.

2 comments

I think this one better illutrates the point you were making: https://godbolt.org/z/qbfhb6dKo

Here I've inserted an increment of the volatile (i.e. a write access) at the start of the loop. If the divisor is 0, in the optimized version with the division hoisted out of the loop, the increment will never actually happen, not even once. Whereas it should in fact happen 1x at the beginning of the first loop iteration with "unoptimized" code.

I don't find this offputting: First, the incrementing code is still in the output binary. I think what is understood by "time travel", and what would be offputting to most programmers, is if the compiler was making static inferences and was removing entire code branches based on that -- without telling the user. If that was the case, I would consider it a compiler usability bug. But that's not what's happening here.

Second, I think everybody can agree that the compiler should be able to reorder a division operation before a write access, especially when hoisting the division out of a loop. So while maybe an interesting study, I think the behaviour here is entirely reasonable -- irrespective of standards. (But again, I don't think uecker, nor anyone else, said that the compiler may never reorder divisions around side-effecting operations just because the division "could" be UB).

Well, that hoisting is contrary to what uecker says is the standard intent.

I think that discussing about omitting branches is a red herring, there is no expectation that the compiler should emit branches or basic blocks that match the source code even in the boring, non-ub case.

The only constraint to compiler optimizations is the as-if rule and the as-if rule only requires that side effects and their order be preserved for conforming programs. Uecker says that in addition to conforming programs, side effects and their ordering also need to be preserved up to the UB.

I do of course also find it unsurprising that the idiv is hoisted, but, as the only way that the standard can constraint compilers is through observable behaviour, I don't see how you can standardize rules where that form of hoisting is allowed while the other are not.

In fact the compiler could easily optimize that loop while preserving the ordering by transforming it to this:

  extern volatile int x;
  int ub(int d, int c) {
    int r;
    x += 3;
    r += x;
    int _div = d / c;
    r += _div;

    for (int 2 = 0; i < 100; ++i) {
        x += 3;
        r += x;
        r += _div;
    }
    return r;
  }
This version preserves ordering while still optimizing away the div. In fact this would also work if you replaced the volatile with a function call, which currently GCC doesn't optimize at all.
Thanks for clarifying, I understand much better now.

And I think I can agree that under a strict interpretation of the rule that UB doesn't get reordered with observable behaviour the GCC output in the godbolt is wrong.

Maybe it has something to do with the fact that it's volatiles? I've hardly used volatiles, but as far as I know their semantics have traditionally been somewhat wacky -- poorly understood by programmers and having inconsistent implementations in compilers. I think I've once read that a sequence of volatile accesses can't be reordered, but other memory accesses can very well be reordered around memory accesses. Something like that -- maybe the rules in the compiler are too complicated leading to an optimization like that, which seems erroneous.

But look at this, where I've replaced the volatile access with a printf() call as you describe: https://godbolt.org/z/Ec8aYnc3d . It _does_ get optimized if the division comes before the printf. The compiler seems to be able to do the hoisting (or maybe that can be called "peeling" too?). But not if you swap the two lines such that the printf comes before the division. Maybe the compiler does in fact see that to keep ordering of observable effects, it would have to duplicate both lines, effectively duplicating the entire loop body for a single loop iteration. In any case, it's keeping both the printf() and the div in the loop body.

I do believe that by a strict reading of the standard GCC is non conforming here. This reading of the standard is not agreed by the GCC developers though: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104800

If the first div happens before the first printf, then it can be CSE out of the loop as any trap would have happened before the printf anyway, so no reordering, and if it didn't trap the first time it wouldn't have trapped later either. In this case CSE is fine and there are no reordering.

If the div happens after printf, then reordering is prohibited not only to preserve side effects before UB (which we have seen GCC doesn't necessarily respects), but because for the most part printf is treated as an opaque function: it could legitimately exit, or longjump out of the function or never return, so on the abstract machine the UB might not happen at all. So it is not safe to hoist trapping instruction like div above opaque functions (but it is safe to sink them).

Still the modification I showed for volatile can be applied as well: peel the first iteration out of the loop so that the first printf can be done before computing the div to be CSEd out of the loop. But GCC doesn't do it although it seems desirable.

I'm very sorry, yes, you are right of course the load is still there. I was so fixated in producing a minimal test case that I failed to interpret the result.

Now I'm not able to reproduce the issue with a guaranteed UB. I still think the loop variant shows the same problem though.

In any case, yes, according to the C standard a volatile read counts as an observable side effect.