Hacker News new | ask | show | jobs
by LoganDark 719 days ago
> While undefined behaviour famously can time travel, that’s only if it would actually have occurred in the first place.

I've always been told that the presence of UB in any execution path renders invalid all possible execution paths. That is, your entire program is invalid once UB exists, even if the UB is not executed at runtime.

Are you saying this isn't quite true?

3 comments

That's not true.

If you do `5 / argc`, that's only undefined behavior if your program is called without any arguments; if there are arguments then the behavior is well defined.

Instead, the presence of UB in the execution path that is actually taken, renders invalid the whole execution path (including whatever happens "before" the UB). That is, an execution path has either defined or undefined behavior, it cannot be "defined up to point-in-time T". But other execution paths are independent.

Thus, UB can "time-travel", but only if it would also have occurred without time travel. It must be caused by something happening at runtime in the program on the time-travel-free theoretical abstract machine; it cannot be its own cause (no time travel paradoxes).

So the "time-travel" explanation sounds a lot more scary than it actually is.

Pretty sure argc is 1 in your example, the name of the binary, no?

Edit: argv is the name, argc will be 1

Yes. It's possible to get `argc` to equal zero, though by invoking the program using `execve(prog, {NULL}, {NULL})` on Linux. This has, rather famously, caused at least one out-of-bounds error in a security-critical program (CVE-2021-4034 "Pwnkit", LPE by invoking Polkit's pkexec with a zero-length argv).
Didn’t even think about that. Very good point.
It's possible to call programs without any arguments, not even the path to the binary. I believe passing the path to the binary is merely a shell convention, because when calling binaries directly from code (not through the shell), sometimes it's possible to forget to specify arg 0 (if your chosen abstraction doesn't provide it automatically). I bet this has caused tons of confusion for people.
It is fully possible to launch a program with argc = 0. But yes, replace argc with (argc - 1) in the example to match the typical case.
> Are you saying this isn't quite true?

It is not. The presence of UB in an execution path renders that execution path invalid. UBs are behaviours, essentially partial functions which are allowed to arbitrarily corrupt program state rather than error.

However "that execution path" can be extensive in the face of aggressive advanced optimisations.

The "time travel" issue is generally that the compiler can prove some paths can't be valid (they always lead to UB), so trims out those paths entirely, possibly leaving just a poison (a crash).

Thus although the undefined behaviour which causes the crash "should" occur after an observable side-effect, because the program is considered corrupt from the point where it will inevitably encounter an UB the side-effect gets suppressed, and it looks like the program executes non linearly (because the error condition which follows the side effect triggers before the side effect executes).

Note that C++ has time-travel, C has not. A printf on a path which later encounters an operation with UB needs to be preserved.
Hmm, it could be that once UB is encountered the entire program becomes invalid, then. In practice, a lot of UB is quite subtle and may not necessarily result in complete disaster, but of course once it's occurred you could end up in any number of completely invalid states and that would be the fault of the UB.
> Hmm, it could be that once UB is encountered the entire program becomes invalid, then.

The UB doesn't actually need to be encountered, just guaranteed to be encountered eventually (in a non-statistical meaning), that is where the time travel comes from e.g. if you have

    if (condition) {
        printf("thing\n");
        1/0;
    } else {
        // other thing
    }
the compiler can turn this into

    if (condition) {
        // crash
    } else {
        // other thing
    }
as well as

    // other thing
In the first case you have "time travel" because the crash occurs before the print, even though in a sequentially consistent world where division by zero was defined as a crash (e.g. in python) you should see the print first.
> The UB doesn't actually need to be encountered, just guaranteed to be encountered eventually

That's what I meant. Anything that leads to UB itself has UB.

By this logic the function below would be UB. It isn't.

  void f()
  {
    int d;
    bool divide;
    std::cin >> d >> divide;
    std::cout << (divide ? 1/d : d);
  }