| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kazinator 647 days ago

I'm not seeing how this is a change. C99 also said "for which".

C99: "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements."

Martin Uecker said that something was fixed in the C23 draft, and when asked about it, pointed to the "for which" as pertaining to just that construct and not the entire program.

I'm afraid that this fellow showed himself unreliable in that thread, in matters of interpreting the C standard. In any case, a random forum remark by a committee member, is not the same thing as a committee response to a request for clarification. It has to be backed by citations and precise reasoning, like anyone else's remark.

Suppose we have a sequence of statements S1; S2; S3 where S3 contains the expression i + 1, i being of type int, and nothing in these statements alters the value of i (it is live on entry into S1 and there is no other entry). It is valid for S1 to be translated according to the supposition that i is less than INT_MAX. Because if that is not the case, then S3 invokes undefined behavior, and S3 is unconditionally reachable via S1.

The whole idea that we can have an __notreached expression which does nothing but invoke UB is predicated on time travel (time travel at program analysis time: being able to reason about the program in any direction). Since __notreached invokes UB, the implementation may assume that the control flow does not reach it and behave accordingly. Any statement which serves as a gateway to __notreached itself invokes undefined behavior, and is therefore assumed unreachable and may be deleted. This reasoning propagates backwards, and so the optimizer can simply delete a whole swath of statements.

Backwards reasoning has been essential in the implementation of compilers for decades. Basic algorithms like liveness analysis involve scanning basic blocks of instructions in reverse order! How you know that a variable is dead at a given point (so its register can be reused for another value) is due to having scanned backwards: the next-use information is a peek into the future (what will be the future when that instruction is running).

And, about the quesiton whether undefined behavior can make the whole program undefined, the answer is maybe. If there is no way to execute the program such that an undefined behavior is avoided, then the whole program is undefined. If the situation can be deduced while it is being translated, then the translator can stop with a diagnostic message.

E.g.:

  #include <stdio.h>

  int main() // (void) is now deprecated in draft ISO C
  {
     printf("hello, world\n");
     return 0/0;
  }

This program does not have a visible behavior of printing the hello message. The 0/0 division is undefined, and amounts to a declaration that the printf statement is unreachable. The implementation is free to delete that statement, or to issue a diagnostic and not translate the program.

Uecker is right in that there are limits on this. If a program issues some output (visible effect) and then performs input, from which it obtains a value, and that value is then embroiled in a calculation that causes undefined behavior, that previous visible effect stands. The whole program is not undefined. It's something like: the program's execution becomes undefined at the point where it becomes inevitable that UB shall occur. That could be where the value is prepared that will inevitably cause the erroneous calculation. So, as far back as that point of no return, the implementation could insert a termination, with or without a diagnostic.

Undefined behavior is not a visible behavior; it doesn't have to be ordered with regard to visible behaviors.