Hacker News new | ask | show | jobs
by charcircuit 840 days ago
Because behavior does eventually get defined somewhere. Just because it's not defined in the C standard it does not mean you can't reason about it.
1 comments

No, if it was defined somewhere, it'd have a consistent behavior and it wouldn't "time-travel" the way UB can. The word for this in the standards is unspecified behavior. Undefined behavior doesn't need to have any requirements. Different parts of the toolchain and runtime environment (or even different compiler passes) may assume different behaviors for the construct. Even different calls to the same function with the same arguments may produce different behaviors.

Let's walk through a simple example to make this clear. Let's assume you have a macro function foo() that triggers some trivial UB, perhaps integer overflow. Let's also say that this macro function is called the same way in two different translation units. Because there are no requirements on UB by definition, there's no guarantee that those calls will do the same thing, even on the same runtime, using the same compiler, with the same flags. Even the same line of code calling the same arguments may see different things every time, because again there are no required behaviors.

Even code that does not itself trigger UB, but is on an execution path with UB does not have a defined behavior and will commonly be omitted by optimizing compilers like GCC. This has resulted in Linux vulnerabilities where null pointer checks were omitted from the actual binary because other code was "proven" by the compiler to dereference the pointer first.

>Because there are no requirements on UB by definition, there's no guarantee that those calls will do the same thing, even on the same runtime, using the same compiler, with the same flags.

Reread my comment. You are talking about behavior not defined by the C standard which I addressed in that comment. Compilers are deterministic. Reproducible builds are a thing.

Reproducibility is an entirely unrelated issue. The same compiler can produce different assembly for the same code depending on the surrounding context, or any number of other reasons. A reproducible build just means that you'll get the same binary each time you build it. Furthermore, the same generated assembly can produce different results each time it's run, as data races do. In that case, the only "definition" comes down to the essentially unknowable physical state of the system.
>Reproducibility is an entirely unrelated issue.

No, reproducibility is about having a defined output for a given source code and toolchain.

I wrote up a quick example demonstrating UB compiling to two different implementations: https://godbolt.org/z/nd7GrP44s

Ignore how silly the actual code is and notice that the -O0 assembly checks the pointers before dereferencing them while the -O2 assembly does not. Same compiler, same translation unit, different assembly. Calling each with null pointers will behave differently too. Run this with whatever reproducible toolchain you want. Reproducible builds are not about making undefined behavior deterministic, they're a separate and largely unrelated topic.

In order to make this example you showed me you were successfully able to reason about the output of the compiler despite using UB. You understood how things were defined differently for different optimization levels.