Hacker News new | ask | show | jobs
by joelkevinjones 1860 days ago
Two points:

The notion that compilers that encounter undefined behavior are allowed to generate any code they want is a new interpretation, for some value of "new". I can't remember the first time I encountered such an interpretation being used by compiler writers to justify something they wanted to do until sometime after 2000.

The notion that John Regehr has (quoted in the article) that undefined behavior implies the whole execution is meaningless is not supported by the language of either the C89 or C99 standard, at least by my reading. The C89 standard has a notion of sequence points. Wouldn’t all sequence points executed before undefined behavior is encountered be required to occur as if the undefined behavior wasn’t there? It would seem so:

From the C89 standard: 2.1.2.3 Program execution

The semantic descriptions in this Standard describe the behavior of an abstract machine in which issues of optimization are irrelevant. Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place.

The C99 standard has nearly identical language:

5.1.2.3 Program execution 1 The semantic descriptions in this International Standard describe the behavior of an abstract machine in which issues of optimization are irrelevant. 2 Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects,11) which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place.

3 comments

> Wouldn’t all sequence points executed before undefined behavior is encountered be required to occur as if the undefined behavior wasn’t there? It would seem so:

No. Code optimization is a series of logic proofs. It is like playing Minesweeper. If a revealed square has 1 neighboring mine and a count of 1, then you know that all 7 other squares are safe. In other Minesweeper situations you make a proof that is much more complex and allows you to clear squares many steps away from a revealed mine. If you make a false assumption of where a mine is, via a faulty proof, then you explode.

The compiler is exactly like that. "If there is only one possible code path through this function, then I can assume the range of inputs to this function, then I can assume which function generated those inputs..."

You can see how the compiler's optimization proof goes "back in time" proving further facts about the program's valid behavior.

If the only valid array indexes are 0 and 1 then the only valid values used to compute those indexes are those values that produce 0 and 1.

This isn't even program execution. In many cases the code is collapsed into precomputed results which is why code benchmarking is complicated and not for beginners. Many naive benchmark programs collapse 500 lines of code and loops into "xor eax,eax; ret;" A series of putchar, printf and puts calls can be reduced to a single fwrite and a malloc/free pair can be replaced with an implicit stack alloca because all Standard Library functions are known and defined and there is no need to actually call them as written.

The standard (in the prevailing reading of the UB section, and also in practice) places no requirements on the behavior of programs containing UB. None of the paragraphs you quoted have any bearing on how an UB-laden program behaves.
No, sequence points aren't relevant because they occur at runtime. "Time traveling UB" happens in the compiler, typically in optimization passes and can cause otherwise valid code to exhibit completely different behavior than it would if the UB didn't exist.