Hacker News new | ask | show | jobs
by Ontonator 719 days ago
> Somewhat expectedly, gcc remains faithful to its crash approach, though note that it only inserts the crash when it compiles the division-by-zero, not earlier, like at the beginning of the function. […] The mere existence of UB in the program means all bets are off and the compiler could chose to crash the function immediatley upon entering it.

GCC leaves the print there because it must. While undefined behaviour famously can time travel, that’s only if it would actually have occurred in the first place. If the print blocks indefinitely then that division will never execute, and GCC must compile a binary that behaves correctly in that case.

5 comments

Don't worry; a function blocking indefinitely (i.e., there is some point where it stops giving off side effects, and never returns) is also UB. C++ attempts to guarantee a certain amount of forward progress, for now.
But a function blocking indefinitely while repeatedly writing to a volatile variable is well-defined. So the compiler cannot remove a function call followed by UB unless it knows that the function won't do that.

In theory, the compiler could know that since `printf` is a well-known standard function. In practice, `printf` might even exit the program via SIGPIPE, so I don't think any compiler will assume that it definitely will return.

/puts shell developer hat on

Not just sigpipe. Getting sigttou is part and parcel of how command line apps share control of the tty.

Be careful: it’s been a while since I used C and I haven’t used much C++, but I think forward progress is only guaranteed by C++, not C.
In C, undefined behavior can not time travel. This was never supported by the wording and we clarified this in C23.
Do you have a specific reference for this? I’d love to know more.
Same here, would love to educate myself on this with specific standard references
You can find the C standard for C23 here. https://www9.open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf (technically it is the draft for the next version, but only has a footnote changed). The definition for UB is (not changed with C23):

"undefined behavior - behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this document imposes no requirements"

It says "for which" and not "for the whole program". So the "interpretation" that a program becomes invalid or other requirements are affected was never supported by the text in the C standard. This was justified by some with arguments such as: "no requirements" includes behavior that can go back in time or otherwise break the rules of the real world. Of course, if you start to interpret a standard text in this creative way, you can justify anything. A reasonable interpretation must assume some general properties of the application domain it applies to (and in C there also more specific rules about how requirements are to be interpreted in 5.1) and must assume that wording such as "for which" has actual meaning that needs to be taken into account and can not simply be ignored. In C23 "Note 3" was added that aims to clarify that such nonsensical interpretations are not intended:

"Note 3 to entry: Any other behavior during execution of a program is only affected as a direct consequence of the concrete behavior that occurs when encountering the erroneous or non-portable program construct or data. In particular, all observable behavior (5.1.2.4) appears as specified in this document when it happens before an operation with undefined behavior in the execution of the program"

Does this means that compilers cannot in general reorder non-side-effects operations across side effects, even if those operations wouldn't be globally visible if not for UB? Alternatively, is UB ordered with side effects? What's the ordering of UB with regard to other thread operations? Does it guarantee sequential consistency? I guess happens-before is guaranteed only if something else would guarantee happens before, but it means further constraint on ordering of, for example, potentially faulting operations across atomic operations.

Ie:

   int ub(int idx, _Atomic int* x)) {
       char y[1000];
       int r = *x;    // 2
       r+= y[idx];    // 1
       return r;
   }
Statements 1 and 2 can't be reordered as any UB in accessing y[idx] is sequenced-after any side effect that happens-before 2, even if y is a purely local, non-escaping variable. This puts constraints on register allocation for example.

This opens a big can of worms.

edit: from a few quick tests GCC seems quite good at preserving ordering of potentially faulting instructions across side effects, even when reordering would be profitable (for example hoisting operations out of loops). It might be a posix requirement in practice because signals make them "visible".

edit2:

ok, this "fails":

  extern volatile int x;
  int ub(int d, int c) {
    int r;
    for (int i = 0; i < 100; ++i) {
      r+= x;   // 1
      r+= d /c; // 2
    }
    return r;
  }
GCC -O3 will hoists out of the looop the potentially faulting (and expensive) division at [2] before the volatile load at [1] (which is a side effect). Would you consider this a valid transformation or is it time-traveling UB?

Potentially this could be fixed by unrolling out the first iteration of the loop and preserving the first volatile access above the division, which can then be cached. This would also help with the variant where the volatile access is replaced by a function call that currently gcc doesn't optimize.

The can of worms is not so big actually. In general, observable behavior is only I/O and volatile accesses. This is not about side effects in general (which can be optimized according to the "as if" rule). So many things can still be reordered vs potentially trapping operations. Also potentially trapping operations can be reordered. For multi-threading we have "happens before" relationship, so a natural interpretation is that everything which happens before the UB is safe.

The reordering of a volatile access after a potentially trapping operation is not conforming. I think it is an important property of volatile that it prevents this optimization, so I hope that GCC will be fixed eventually. A potentially trapping operation can also not be hoisted above a function call, and compilers that did this all got fixed in the mean time.

https://developercommunity.visualstudio.com/t/Invalid-optimi...

> If the print blocks indefinitely then that division will never execute, and GCC must compile a binary that behaves correctly in that case.

Is `printf` allowed to loop infinitely? Its behaviour is defined in the language standard and GCC does recognize it as not being a user-defined function.

I’m not sure it can loop indefinitely, but it can block (e.g. if the reader of the pipe is not reading from it and the buffer is full).
Your reasoning is incorrect. Here is how I reason about it.

Division by zero is undefined behavior. The compiler can assume that it will not happen.

If the divisor is not zero, then the calculation has no side effects. The compiler may reorder the division above the print, because it would have no observable difference in behavior. This could be useful because division has a high latency, so it pays to start the operation as soon as the operand values are known.

If the divisor is zero, the UB says that there is no requirement on how it's compiled, so reordering the division above the print is legal.

  const int div = 0;
  if(div) { 
    return 1/div;
  }
  return 0;
The statement at line 3 would have undefined behaviour, yet is never reached so this is a perfectly legal program and any transformation that hoists it above the check is invalid.

If you replace 'if(div)' with an opaque function call, that doesn't change anything as the function might never exit the program, never return, long jump or return via an exception.

How can it both have no side effects and have undefined behaviour?
By being declared as that?

Division has no side effects, and division by 0 is UB. UBs only occur in invalid programs, so behaviour in case of UB is not relevant to a discussion of side effects or their lack thereof, in language terms these are not programs at all.

If print statement never completes then it is a well defined program. Because there will be no division by zero.
And if my grandmother had wheels she'd be a bike.
Is your grandmother a bike every time the receiving side of a pipe terminates before the sending side, or only when it happens to interrupt a printf?
Both statements don't need to be true. The compiler just has to prove that at least one of the statements will always be true.
True but it only has no side effects if it's well defined. The compiler can assume it's well defined but only if it's executed.
> While undefined behaviour famously can time travel, that’s only if it would actually have occurred in the first place.

I've always been told that the presence of UB in any execution path renders invalid all possible execution paths. That is, your entire program is invalid once UB exists, even if the UB is not executed at runtime.

Are you saying this isn't quite true?

That's not true.

If you do `5 / argc`, that's only undefined behavior if your program is called without any arguments; if there are arguments then the behavior is well defined.

Instead, the presence of UB in the execution path that is actually taken, renders invalid the whole execution path (including whatever happens "before" the UB). That is, an execution path has either defined or undefined behavior, it cannot be "defined up to point-in-time T". But other execution paths are independent.

Thus, UB can "time-travel", but only if it would also have occurred without time travel. It must be caused by something happening at runtime in the program on the time-travel-free theoretical abstract machine; it cannot be its own cause (no time travel paradoxes).

So the "time-travel" explanation sounds a lot more scary than it actually is.

Pretty sure argc is 1 in your example, the name of the binary, no?

Edit: argv is the name, argc will be 1

Yes. It's possible to get `argc` to equal zero, though by invoking the program using `execve(prog, {NULL}, {NULL})` on Linux. This has, rather famously, caused at least one out-of-bounds error in a security-critical program (CVE-2021-4034 "Pwnkit", LPE by invoking Polkit's pkexec with a zero-length argv).
Didn’t even think about that. Very good point.
It's possible to call programs without any arguments, not even the path to the binary. I believe passing the path to the binary is merely a shell convention, because when calling binaries directly from code (not through the shell), sometimes it's possible to forget to specify arg 0 (if your chosen abstraction doesn't provide it automatically). I bet this has caused tons of confusion for people.
It is fully possible to launch a program with argc = 0. But yes, replace argc with (argc - 1) in the example to match the typical case.
> Are you saying this isn't quite true?

It is not. The presence of UB in an execution path renders that execution path invalid. UBs are behaviours, essentially partial functions which are allowed to arbitrarily corrupt program state rather than error.

However "that execution path" can be extensive in the face of aggressive advanced optimisations.

The "time travel" issue is generally that the compiler can prove some paths can't be valid (they always lead to UB), so trims out those paths entirely, possibly leaving just a poison (a crash).

Thus although the undefined behaviour which causes the crash "should" occur after an observable side-effect, because the program is considered corrupt from the point where it will inevitably encounter an UB the side-effect gets suppressed, and it looks like the program executes non linearly (because the error condition which follows the side effect triggers before the side effect executes).

Note that C++ has time-travel, C has not. A printf on a path which later encounters an operation with UB needs to be preserved.
Hmm, it could be that once UB is encountered the entire program becomes invalid, then. In practice, a lot of UB is quite subtle and may not necessarily result in complete disaster, but of course once it's occurred you could end up in any number of completely invalid states and that would be the fault of the UB.
> Hmm, it could be that once UB is encountered the entire program becomes invalid, then.

The UB doesn't actually need to be encountered, just guaranteed to be encountered eventually (in a non-statistical meaning), that is where the time travel comes from e.g. if you have

    if (condition) {
        printf("thing\n");
        1/0;
    } else {
        // other thing
    }
the compiler can turn this into

    if (condition) {
        // crash
    } else {
        // other thing
    }
as well as

    // other thing
In the first case you have "time travel" because the crash occurs before the print, even though in a sequentially consistent world where division by zero was defined as a crash (e.g. in python) you should see the print first.
> The UB doesn't actually need to be encountered, just guaranteed to be encountered eventually

That's what I meant. Anything that leads to UB itself has UB.

By this logic the function below would be UB. It isn't.

  void f()
  {
    int d;
    bool divide;
    std::cin >> d >> divide;
    std::cout << (divide ? 1/d : d);
  }