UB can not travel back in time in C. Although it is true that it can affect previous instructions, but that code is reordered or transformed in complicated ways is true even without UB.
Yes, random blog posts did a lot of damage here. Also broken compilers [1]. Note that blog post is correct about C++ but incorrectly assumes this is true for C as well.
I'm inclined to trust Raymond Chen and John Regehr on these matters, so if you assert that they're incorrect here then a source to back up your assertion would help your argument.
I am a member of WG14. You should check the C standard. I do not see how "time-travel" is a possible reading of the definition of UB in C. We added another footnote to C23 to counter this idea:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf
"Any other behavior during execution of a program is only affected as a direct consequence of the concrete behavior that occurs when encountering the erroneous or non portable program construct or data. In particular, all observable behavior (5.1.2.4) appears as specified in this document when it happens before an operation with undefined behavior in the execution of the program."
I should point out that compilers also generally do not do true time-travel: Consider this example: https://godbolt.org/z/rPG14rrbj
So maybe we have different definitions of "time travel". But I recall that
- if a compiler finds that condition A would lead to UB, it can assume that A is never true
- that fact can "backpropagate" to, for example, eliminate comparisons long before the UB.
There may be different definitions, but also a lot of incorrect information. Nothing changes with C23 except that we added a note that clarifies that UB can not time-travel. The semantic model in C only requires that observable effects are preserved. Everything else can be changed by the optimizer as long as it does not change those observable effects (known as the "as if" principle). This is generally the basis of most optimizations. Thus, I call time-travel only when it would affect previous observable effects, and this what is allowed for UB in C++ but not in C. Earlier non-observable effects can be changed in any case and is nothing speicifc to UB. So if you call time-travel also certain optimization that do not affect earlier observable behavior, then this was and is still allowed. But the often repeated statement that a compiler can assume that "A is never true" does not follow (or only in very limited sense) from the definition of UB in ISO C (and never did), so one has to be more careful here. In particular it is not possible to remove I/O before UB. The following code has to print 0 when called with zero and a compiler which would remove the I/O would not be conforming.
int foo(int x)
{
printf("%d\n", x);
fflush(stdout);
return 1 / x;
}
In the following example
int foo(int x)
{
if (x) bar(x);
return 1 / x;
}
the compiler could indeed remove the "if" but not because it were allowed to assume that x can never be zero, but because 1 / 0 can have arbitrary behavior, so could also call "bar()" and then it is called for zero and non-zero x and the if condition could be removed (not that compilers would do this)
There is unconditional use of a pointer b, which is UB if b is null. However, there is an earlier branch that checks if b is null. If we expected the UB to "backpropagate", the compiler would eliminate that branch, but both gcc and clang at O3 keep the branch.
However, both gcc and clang have rearranged the side effects of that branch to become visible at the end of the function. I.e. if b is null, it's as if that initial branch never ran. You could observe the difference if you trapped SIGSEGV. So even though the compiler didn't attempt to "time-travel" the UB, in combination with other allowed optimizations (reordering memory accesses), it ended up with the same effect.
While interactions with volatile and interactive streams cannot time-travel, anything else is free to - the standard only imposes requirements on a conforming implementation in terms of the contents of files at program termination, and programs with undefined behaviour are not required to terminate, so there are approximately no requirements on a program that invokes undefined behaviour.
The issue you linked to is not a counter example because, as the poster said, g may terminate the program in which case that snippet does not have undefined behaviour even if b is zero. The fact that they bothered to mention that g may terminate the program seems like an acknowledgement that it would be valid to do that time travelling if it didn't.
> Note that blog post is correct about C++ but incorrectly assumes this is true for C as well.
Presumably you're referring to this line of the C++ standard, which does not appear in the C standard:
> However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).
I looked at every instance of the word "undefined" in the C standard and, granted, it definitely didn't have anything quite so clear about time travel as that. But it also didn't make any counter claims that operations before are valid. It pretty much just said that undefined behaviour causes behaviour that is undefined! So, without strong evidence, it seem presumptuous to assume that operations provably before undefined behaviour are well defined.
The poster is me. You are right that this is not an example for time-travel. There aren't really good examples for true time travel because compilers generally do not do this. But my point is that with compilers behaving like this, people might confuse this for time-traveling UB. I have certainly met some who did and the blog posts seems to have similar examples (but I haven't looked closely now).
I didn't notice that section when I last read through C23, but I'm very glad to see it. Reining in UB is one of the hardest problems I've had to deal with, and being able to say operations are defined up to the point of UB makes my job so much easier.
The lack of clarity in earlier standards made it impossible to deal with code incrementally, since all the unknown execution paths could potentially breach back in time and smash your semantics.
Ok, fair enough. I must admit I was looking at C99 as I thought that was most generally relevant, I don't follow recent C standards (as much as I do those for C++) and C23 hasn't been ratified yet. I've found your new snippet:
> In particular, all observable behavior (5.1.2.4) appears as specified in this document when it happens before an
operation with undefined behavior in the execution of the program.
I consider that a change in the standard but, of course, that's allowed, especially as it's backwards compatible for well defined programs.
The wording is a little odd: it makes it sound a like you need some undefined behaviour in order to make the operations beforehand work, and, taken very literally, that operations between two undefined behaviours will work (because they're still "before an operation with undefined behavior"). But I suppose the intention is clear.
The definition of UB (which hasn't changed) is: "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this document imposes no requirement."
Note that the "for which" IMHO already makes this clear that this can not travel in time. When everything could be affected these words ("for which") would be meaningless.
Yes, and with undefined behavior, the compiler has to emit code that has the behavior defined by the code up to the operation that has undefined behavior.
That is false. If a compiler determines that some statement has undefined behavior, it can treat it as unreachable, and, transitively, other code before it as unreachable.
printf("hello\n"); // this doesn't have to print
x = x / 0; // because this is effectively a notreached() assertion
This is in direct contradiction to what uecker says. Can you back up your claim -- for both C and C++? Putting your code in godbolt with -O3 did not remove the print statement for me in either C or C++. But I didn't experiment with different compilers or compiler flags, or more complicated program constructions.
I've often said that I've never noticed any surprising consequences from UB personally. I know I'm on thin ice here and running risk of looking very ignorant. There are a lot of blogposts and comments that spread what seems like FUD from my tiny personal lookout. It just seems hard to come across measureable evidence of actual miscompilations happening in the wild that show crazy unpredictable behaviour -- I would really like to have some of it to even be able to start tallying the practical impact.
And disregarding whatever formulations there are in the standard -- I think we can all agree that insofar compilers don't already do this, they should be fixed to reject programs with an error message should they be able to prove UB statically -- instead of silently producing something else or acting like the code wouldn't exist.
Is there an error in my logic -- is there a reason why this shouldn't be practically possible for compilers to do, just based on how UB is defined? With all the flaws that C has, UB seems like a relatively minor one to me in practice.
This is an adaption from the Raymond Chen post, and it seems to actually compile to a "return 1" when compiling with C++ (not with C), at least with the settings I tried. And even the "return 1" for me is understandable given that we actually hit a bug and there are no observeable side-effects before the UB happens. (But again, the compiler should instead be so friendly and emit a diagnostic about what it's doing here, or better return an error).
Un-comment the printf statement and you'll see that the code totally changes. The printf actually happens now. So again, what uecker says about observable effects seems to apply.
In this [1] example GCC hoists, even in C mode, a potentially trapping division above a volatile store. If c=0 you get one less side effect than expected before UB (i.e. the division by zero trap). This is arguably a GCC bug if we agree on the new standard interpretation, but it does show that compilers do some unsafe time travelling transformations.
Hoisting the loop invariant div is an important optimization, but in this case I think the compiler could preserve both the optimization and the ordering of the side effects by loop-peeling.
Thanks for the example. But again I can't see a problem. The compiler does not actually prove UB in this case, so I suppose this doesn't qualify as applying (mis-) optimizations silently based on UB. Or what did I miss?
The implementation can assume that the program does not perpetrate undefined behavior (other than undefined behavior which the implementation itself defines as a documented extension).
The only way the program can avoid perpetrating undefined behavior in the statement "x = x / 0" is if it does not execute that statement.
Thus, to assume that the program does not invoke undefined behavior is tantamount to assuming that the program does not execute "x = x / 0".
But "x = x / 0" follows printf("hello\n") unconditionally. If the printf is executed, then x = x / 0 will be executed. Therefore if the program does not invoke undefined behavior, it does not execute printf("hello\n") either.
If the program can be assumed not to execute printf("hello\n"), there is no need to generate code for it.
Look at the documentation for GCC's __builtin_unreachable:
> If control flow reaches the point of the __builtin_unreachable, the program is undefined. It is useful in situations where the compiler cannot deduce the unreachability of the code.
The unreachable code assertion works by invoking undefined behavior!
x/0 is not reached if the printf blocks forever, exits or return via an exceptional path (longjmp in C, exceptions in C++). Now specifically standard printf won't longjmp or exit (but glibc one can), but it still can block forever, so the compiler in practice can't hoist UB over opaque function calls.
edit: this is in addition to the guarantees with regard to side effects that uecker says the C standard provides.
But does `printf();` return to the caller unconditionally?
This is far from obvious -- especially once SIGPIPE comes into play, it's quite possible that printf will terminate the program and prevent the undefined behavior from occurring. Which means the compiler is not allowed to optimize it out.
Yeah but do you have an actual instance of "time travel" happening? Without one the issue is merely theoretic discussion of how to understand or implement the standards. If you provide a real instance, the practical impact and possible remedies could be discussed.
#include <stdio.h>
int f(int y, int a) {
int x, z;
printf("hello ");
x = y / a;
printf("world!");
z = y / a;
return x+y;
}
In godbolt, it seems the compiler tends to combine the two printfs together. So if a=0, it leads to UB between the printfs, but that wont happen until after the two printfs. Here the UB is delayed. But will the compiler actually make sure that in some other case, the x/a won't be moved earlier somehow? Does the compiler take any potentially undefined behavior and force ordering constraints around them? ...The whole point of UB is to be able to optimize the code as if it doesn't have undefined behavior, so that we all get the maximum optimization and correct behavior as long as there's no UB in the code.
If a compiler can determine that some statement is UB, it can treat that as an assertion that the code is unreachable. All other statements which reach only that code and no other are also unreachable.
A compiler's analysis can go backward in time. That is to say, the compiler can build a model of what happens in some section of code over time, and analyze it whichever way it wants.
You cannot go back in time from execution time to translation time, but the translator can follow the code as if it were executing it at translation time.
I'm not enough of a specification lawyer to say that this is definitely true, but the reasoning and example given there seems sound to me.
[1] https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...