Hacker News new | ask | show | jobs
by dzaima 845 days ago
UB is far from the only source of systems not doing the desired thing - writing code that ends up at UB is as wrong as writing code that was written with an incorrect understanding of the invoked behavior.

Sure, the neat trick of a+1<a not working is perhaps undesirable, but, even if signed addition was defined to wrap, in most contexts an "a+1" subtracting four billion is not gonna be the specific thing you want it to do in your system.

Alternatively, signed overflow could be defined to return exactly 31415, which would be very concrete defined behavior, but barely if at all more useful compared to it being UB.

1 comments

I hope I didn't imply that UB was the only source of bugs. It obviously isn't. It's just the only source of bugs that has the side effect of undefining the semantics of all your other code.

Just for fun let's take your example and say signed overflow returns integer pi. That now means the compiler has to implement your (hypothetical) next line checking if the result is 31415 rather than omitting it under the assumption that it's unreachable because it would imply UB. All of that code suddenly has defined behavior, even if it's silly.

But what does it get you that it's a "defined but completely unusable value" versus "undefined"? Indexing an array by it, adding it to some previously-meaningful value, or doing anything else with it, is still gonna all do practically arbitrary things.

I suppose in some cases it can lead to bugs being harder to exploit, but it's still a bug and still wrong and still should be fixed. Being defined is not a get out of exploitability free card.

(ok I do have one case where "defined but completely arbitrary" is actually meaningful over "undefined" with no reasonable alternative in C - for a floating-point x, "x==(int)x" for checking if x exactly fits in an int - e.g. gcc on aarch64 or x86+AVX (requiring -fno-trapping-math for whatever reason) optimizes that to "x==floor(x)" as an fp-to-integer cast is undefined on overflowing result)

It means you could know what the code will do, that's it. Even that's useful though. It means you can write complete formal models of the language and apply them against your code. The current situation is that you can only build partial formal models, and the assumptions those models rely on evaporate in the presence of UB. It's a really shitty way to do proofs.

Not knowing what the code will do also means that most of the safety critical code in your life is verified through a checkbox that essentially says "I promise there's no undefined behavior". For example, here's what MISRA says about undefined behavior:

    Rule 1.3: There shall be no occurrence of undefined or critical unspecified behaviour

    Analysis: Undecidable, System
It'd be nice to have at least the potential to analyze the code both as one of the people writing safety-critical code and a person who uses cars, planes, trains, etc.
You can absolutely write formal models with the presence of UB - encountering UB is just a call to do_anything(), and the scenarios in which UB happens is itself well-defined. Determining whether any UB can happen is as "undecidable" as determining whether the program follows a given specification - undecidable in the general case, but likely decidable for most specific cases.

Time travel may feel a little funky as you end up not being able to ensure anything leading up to UB happened, but that might not matter much - even if you have "shut_down_engines(); UB();" and are afraid of engines not ever getting shut down, the UB could equivalently also just run start_engines_back_up(), or even without UB some later code sees your off-by-four-billion number and thinks it really needs to (though yes you could have some truly-supposed-to-be-irreversible actions).

I'm pretty sure engineers expected to follow "there shall be no occurrence of UB" are also expected to follow "there shall be no occurrence of behavior we didn't ask you to write" in general - in a car/plane/train integer overflow is likely gonna result in some pretty undesirable behavior regardless of whether that's because the compiler messed with it or because now all your calculations are off by four billion. (and sometimes the compiler can even optimize based on UB to some more desirable code, e.g. "x-y<0" to "x<y" for signed integers, or expanding the range of lengths a loop works on by promoting the index variable)

And you do have UB sanitizers (and perhaps it'd be neat to have compilers have an option to define as much as is reasonable for absolutely critical software that for whatever reason was written in C).

And you cannot even meaningfully have an equivalent to sanitizers on defined operations - if an operation is explicitly defined, people may rely on it, and therefore it is unacceptable to ever warn on it! (ok rust does do a funky thing of making integer overflow trap on debug builds, and be defined to wrap on release ones, but to me this does not seem like a reasonable approach to have on many things)

The scenarios in which UB can happen aren't actually well defined by the standards. They're just the negative space outside the constraints. I'll grant that most of the useful scenarios are listed though.

Time travel and inconsistency also prevent the "do_anything()" model from working. There is no consistent behavior in the presence of UB, and the program is not even guaranteed to be translated correctly leading up to that point.

As for running sanitizers on defined operations, all you would need to do is add a new kind of behavior alongside implementation defined, unspecified, and UB with defined behavior that it's explicitly illegal to rely on. You could also treat unspecified in this way, though I'd need to think how dangerous that is.

Speaking of sanitizers, most certified compilers don't actually support them. I've unsuccessfully tried to convince a couple vendors that they're important and even gave them an appropriate bare metal runtime to use if only they'd do the work of calling it. No luck.

What happened up to "do_anything()" cannot matter - if you don't like interpreting it as actual time travel, you could alternatively interpret it as the UB rearranging the atoms of the universe to look like some different past happened - no time travel, but result is the same. (done literally you might encounter some issues with physics, but in most practical scenarios reversing some operation after it has happened is plenty simple; and in cases where it's not a C compiler most likely couldn't even have a way to optimize it out, as arbitrary code may include "exit()" at which point removing the invocation is wrong)

"defined behavior that is explicitly illegal to rely on" is a nice oxymoron.

What your certified compilers do or don't support is all a question of self-inflicted problems. (I happen to believe "certified" compilers are primarily a waste of time - with humans writing code/specifications, miscompilations are gonna be an extremely insignificant source of problems, and basically none if you do any amount of testing)