Hacker News new | ask | show | jobs
by AlotOfReading 840 days ago
It means you could know what the code will do, that's it. Even that's useful though. It means you can write complete formal models of the language and apply them against your code. The current situation is that you can only build partial formal models, and the assumptions those models rely on evaporate in the presence of UB. It's a really shitty way to do proofs.

Not knowing what the code will do also means that most of the safety critical code in your life is verified through a checkbox that essentially says "I promise there's no undefined behavior". For example, here's what MISRA says about undefined behavior:

    Rule 1.3: There shall be no occurrence of undefined or critical unspecified behaviour

    Analysis: Undecidable, System
It'd be nice to have at least the potential to analyze the code both as one of the people writing safety-critical code and a person who uses cars, planes, trains, etc.
1 comments

You can absolutely write formal models with the presence of UB - encountering UB is just a call to do_anything(), and the scenarios in which UB happens is itself well-defined. Determining whether any UB can happen is as "undecidable" as determining whether the program follows a given specification - undecidable in the general case, but likely decidable for most specific cases.

Time travel may feel a little funky as you end up not being able to ensure anything leading up to UB happened, but that might not matter much - even if you have "shut_down_engines(); UB();" and are afraid of engines not ever getting shut down, the UB could equivalently also just run start_engines_back_up(), or even without UB some later code sees your off-by-four-billion number and thinks it really needs to (though yes you could have some truly-supposed-to-be-irreversible actions).

I'm pretty sure engineers expected to follow "there shall be no occurrence of UB" are also expected to follow "there shall be no occurrence of behavior we didn't ask you to write" in general - in a car/plane/train integer overflow is likely gonna result in some pretty undesirable behavior regardless of whether that's because the compiler messed with it or because now all your calculations are off by four billion. (and sometimes the compiler can even optimize based on UB to some more desirable code, e.g. "x-y<0" to "x<y" for signed integers, or expanding the range of lengths a loop works on by promoting the index variable)

And you do have UB sanitizers (and perhaps it'd be neat to have compilers have an option to define as much as is reasonable for absolutely critical software that for whatever reason was written in C).

And you cannot even meaningfully have an equivalent to sanitizers on defined operations - if an operation is explicitly defined, people may rely on it, and therefore it is unacceptable to ever warn on it! (ok rust does do a funky thing of making integer overflow trap on debug builds, and be defined to wrap on release ones, but to me this does not seem like a reasonable approach to have on many things)

The scenarios in which UB can happen aren't actually well defined by the standards. They're just the negative space outside the constraints. I'll grant that most of the useful scenarios are listed though.

Time travel and inconsistency also prevent the "do_anything()" model from working. There is no consistent behavior in the presence of UB, and the program is not even guaranteed to be translated correctly leading up to that point.

As for running sanitizers on defined operations, all you would need to do is add a new kind of behavior alongside implementation defined, unspecified, and UB with defined behavior that it's explicitly illegal to rely on. You could also treat unspecified in this way, though I'd need to think how dangerous that is.

Speaking of sanitizers, most certified compilers don't actually support them. I've unsuccessfully tried to convince a couple vendors that they're important and even gave them an appropriate bare metal runtime to use if only they'd do the work of calling it. No luck.

What happened up to "do_anything()" cannot matter - if you don't like interpreting it as actual time travel, you could alternatively interpret it as the UB rearranging the atoms of the universe to look like some different past happened - no time travel, but result is the same. (done literally you might encounter some issues with physics, but in most practical scenarios reversing some operation after it has happened is plenty simple; and in cases where it's not a C compiler most likely couldn't even have a way to optimize it out, as arbitrary code may include "exit()" at which point removing the invocation is wrong)

"defined behavior that is explicitly illegal to rely on" is a nice oxymoron.

What your certified compilers do or don't support is all a question of self-inflicted problems. (I happen to believe "certified" compilers are primarily a waste of time - with humans writing code/specifications, miscompilations are gonna be an extremely insignificant source of problems, and basically none if you do any amount of testing)

Again, you can't usefully encode "do_anything()" into a formal model. As an aside, that definition would also break the fundamental abstractions of the standard in amazingly deep ways. Regardless, my point in this particular comment thread is that eliminating undefined behavior is useful, not that I have some grudge against incompleteness.

The standards already have defined behavior that it's explicitly illegal to rely on, so I'm not sure why it's an oxymoron. Strictly conforming programs are prohibited from relying on implementation-defined behavior. You could start dealing with the issue of UB by a 3 word modification of the rules in 4-3 (N3096), though any actual attempt would have to be much more surgical to avoid undoing a decade of compiler optimizations. This isn't an easy issue and I've never pretended otherwise.

Can't say I disagree about certified compilers (though it's extremely hard to detect miscompilations via testing). Regardless, they exist and regulators/certification authorities effectively require them. Since we all have to trust the code they produce with our lives, we may as well not ignore them.

Some attempts to come up with a case where gcc or clang optimize in a way not easily describable as a specific "do_anything()":

- printf (or any other external call) before UB - both gcc & clang keep the printf.

- write to atomic before UB - easy to reverse by writing the old value, the interim value needn't ever be visible.

- write to atomic/volatile, spinlock, UB - cannot be optimized out as the loop may be infinite (even in C++ as atomic & volatile are exceptions to "no infinite loops allowed")

- write to volatile before UB - both gcc and clang keep the write.

- read from volatile before UB - gcc keeps the read, but clang removes it. This is the closest I've got, but it's quite far from something you'd actually encounter (and could be easily countered by expecting volatile accesses to potentially exit(), at which point removing them is incorrect)

Now, granted, C doesn't guarantee that all UB time travel must be of the easily-reversed kind, but, seemingly, basically nothing would be lost if it were.