Hacker News new | ask | show | jobs
by AlotOfReading 844 days ago
The scenarios in which UB can happen aren't actually well defined by the standards. They're just the negative space outside the constraints. I'll grant that most of the useful scenarios are listed though.

Time travel and inconsistency also prevent the "do_anything()" model from working. There is no consistent behavior in the presence of UB, and the program is not even guaranteed to be translated correctly leading up to that point.

As for running sanitizers on defined operations, all you would need to do is add a new kind of behavior alongside implementation defined, unspecified, and UB with defined behavior that it's explicitly illegal to rely on. You could also treat unspecified in this way, though I'd need to think how dangerous that is.

Speaking of sanitizers, most certified compilers don't actually support them. I've unsuccessfully tried to convince a couple vendors that they're important and even gave them an appropriate bare metal runtime to use if only they'd do the work of calling it. No luck.

1 comments

What happened up to "do_anything()" cannot matter - if you don't like interpreting it as actual time travel, you could alternatively interpret it as the UB rearranging the atoms of the universe to look like some different past happened - no time travel, but result is the same. (done literally you might encounter some issues with physics, but in most practical scenarios reversing some operation after it has happened is plenty simple; and in cases where it's not a C compiler most likely couldn't even have a way to optimize it out, as arbitrary code may include "exit()" at which point removing the invocation is wrong)

"defined behavior that is explicitly illegal to rely on" is a nice oxymoron.

What your certified compilers do or don't support is all a question of self-inflicted problems. (I happen to believe "certified" compilers are primarily a waste of time - with humans writing code/specifications, miscompilations are gonna be an extremely insignificant source of problems, and basically none if you do any amount of testing)

Again, you can't usefully encode "do_anything()" into a formal model. As an aside, that definition would also break the fundamental abstractions of the standard in amazingly deep ways. Regardless, my point in this particular comment thread is that eliminating undefined behavior is useful, not that I have some grudge against incompleteness.

The standards already have defined behavior that it's explicitly illegal to rely on, so I'm not sure why it's an oxymoron. Strictly conforming programs are prohibited from relying on implementation-defined behavior. You could start dealing with the issue of UB by a 3 word modification of the rules in 4-3 (N3096), though any actual attempt would have to be much more surgical to avoid undoing a decade of compiler optimizations. This isn't an easy issue and I've never pretended otherwise.

Can't say I disagree about certified compilers (though it's extremely hard to detect miscompilations via testing). Regardless, they exist and regulators/certification authorities effectively require them. Since we all have to trust the code they produce with our lives, we may as well not ignore them.

Some attempts to come up with a case where gcc or clang optimize in a way not easily describable as a specific "do_anything()":

- printf (or any other external call) before UB - both gcc & clang keep the printf.

- write to atomic before UB - easy to reverse by writing the old value, the interim value needn't ever be visible.

- write to atomic/volatile, spinlock, UB - cannot be optimized out as the loop may be infinite (even in C++ as atomic & volatile are exceptions to "no infinite loops allowed")

- write to volatile before UB - both gcc and clang keep the write.

- read from volatile before UB - gcc keeps the read, but clang removes it. This is the closest I've got, but it's quite far from something you'd actually encounter (and could be easily countered by expecting volatile accesses to potentially exit(), at which point removing them is incorrect)

Now, granted, C doesn't guarantee that all UB time travel must be of the easily-reversed kind, but, seemingly, basically nothing would be lost if it were.