It says everything. Even if undefined behavior works one way on all platforms today, it could work differently tomorrow in a way that introduces bugs.
It's nonsensical to blame the compiler for conforming to the standard in a way that breaks code using undefined behavior.
The C standard includes an appendix that lists ~200 examples of undefined behavior. This list does not claim to be exhaustive.
Often, what constitutes undefined behavior is non-obvious (and not well justified). For example, when adding two signed integers results in an overflow, it is undefined behavior even if your program never uses the result.
Due to C's definition of "undefined" behavior, it means that all of the guarantees we rely on to ensure security go out the window whenever the programmer steps on one of these land mines.
Not all UB falls into this category. A lot of UB, such as your signed integer addition example, is dependent upon the behavior of the underlying hardware. Certain archs may throw an exception on signed integer overflow, or exhibit otherwise inconsistent behavior, for example. The standard is the standard, of course, but not all implementations inherit the UB of the standard.
Whether or not something is "undefined behavior" has nothing to do with the hardware. The C specification says what is specified, what is "unspecified", what is "implementation defined", and what is "undefined".
If something is "undefined" according to C, you can't rely on what the hardware does, because the hardware might not even get a chance to do anything. The compiler may completely elide sections of your program -- and they do in practice (for example, bounds checks).
Actually, hardware always does something reasonable for add instructions (throw exception or overflow or saturate). It's additions in C that can have unreasonable results.
> A lot of UB, such as your signed integer addition example, is dependent upon the behavior of the underlying hardware.
That problem is you can no longer depend on that because the compiler writers have decided they can do anything during AST optimization when there is undefined behavior.
What about I-know-what-I'm-doing const_casts in critical software like operating systems and kernels? Do you think developers should distribute binaries they know work, or let people maybe cause bugs with compiler options?
There is no guarantee it would work if it relies on undefined behavior--that's what undefined means, and also the reason the optimizer acted the way it did.
In that case, adding flags that defines what happens when particular undefined behavior hits would probably be the way to go, or rewrite it in a way that doesn't rely on undefined behavior.
The concept of "undefined" is incoherent. At the same time as people insist the compiler can do anything under the circumstances, everyone accepts that there is some limit to what it is reasonable to expect it to do. It's all just quibbling over where exactly the limits are. But as long as there are limits, the definition of undefined was never valid.
It seems to me that the problem is that trying to define undefined behavior is an inherent contradiction.
Setting aside the question of what exactly "undefined behavior" means, why does a language spec have to include it? If there is behavior that cannot be defined, why not just omit it from the standard?
> Setting aside the question of what exactly "undefined behavior" means, why does a language spec have to include it? If there is behavior that cannot be defined, why not just omit it from the standard?
The original reason was that there were things they didn't want to define. For example, signed integer overflow works differently on different hardware architectures. If they defined one behavior in the standard then compilers for architectures that didn't do it that way would have to do something inefficient to make it work the way the standard says it should rather than the way that hardware actually does it.
Calling it "undefined behavior" lets the compiler do whatever the hardware does even if that means the program produces different results on different architectures. It also means that if some new architecture comes out that does it slightly differently, nobody can be surprised when compilers use the native overflow behavior for that architecture.
The flaw was in giving compilers too much discretion. They were generally expected to implement one of the sane versions of signed integer overflow, and specifically the one corresponding to the relevant hardware architecture, but according to the spec they can literally do whatever they want. So we get this:
Which means you can't use that to check whether signed overflow occurred even when you know the underlying hardware behavior, because if it did occur you've already invoked UB and the compiler is allowed to do anything, including omit your check, which it does.
What would help a bit is if compilers are going to do something like this, they emitted a warning something like "comparison is always false because signed integer overflow is undefined."
What would help even more is for the next version of the standard to convert a lot of this undefined behavior into implementation-defined behavior or similar, which still allows for hardware-specific implementations but requires them to be documented and prevents a lot of this unintuitive ex post facto "optimization" that causes more trouble than it's worth.
"Calling it "undefined behavior" lets the compiler do whatever the hardware does even if that means the program produces different results on different architectures"
Isn't this "implementation dependent", rather than "undefined"?
> The original reason was that there were things they didn't want to define.
For signed integer overflow, maybe. I don't claim to know how this evolved in every last detail, but this is definitely what UB is currently for - there's specifically "implementation-defined behavior" (actual behavior must be documented by the implementation) or "unspecified behavior" (can be non-deterministic, possibly limited) for what you are describing.
Undefined behavior is what allows many optimizations to be made in the first place, and it is also necessary so that compilers don't have to solve the halting problem.
> What would help a bit is if compilers are going to do something like this, they emitted a warning something like "comparison is always false because signed integer overflow is undefined."
Yes, in that specific case that would be a useful warning. Linters can do that for you. But compilers make use of this assumption all the time, for example when optimizing for loops. Would you like a warning every time the compiler made your loops faster by relying on this UB? Every time a pointer is dereferenced?
> What would help even more is for the next version of the standard to convert a lot of this undefined behavior into implementation-defined behavior or similar, which still allows for hardware-specific implementations but requires them to be documented and prevents a lot of this unintuitive ex post facto "optimization" that causes more trouble than it's worth.
For a lot of UB that is not even an option. How do you find the correct initialization order for dynamic initialization? You can't, you'd have to solve the halting problem. It's the programmer's job to get this right, not the compiler's. What should messing this up result in, if not UB?
And you may not like it, but p0907 (which requires signed integers to use two's complement) suggested to make signed integer overflow defined and had that suggestion strongly declined. You put "optimization" in quotes but that's exactly what this is about - in practice it would make tons of code (in particular loops) significantly slower to eliminate this UB. You're free to doubt WG21 but I won't.
> Would you like a warning every time the compiler made your loops faster by relying on this UB?
Yes, because then I know to convert the loop counter to unsigned, which it ought to be anyway so that there isn't problematic behavior if the signed value actually did overflow when using a compiler or compiler flags that don't take that optimization.
> Every time a pointer is dereferenced?
Every time a pointer is dereferenced and the compiler uses that fact to cause some other statement to have no effect? I want to see that warning, yes.
> You put "optimization" in quotes but that's exactly what this is about - in practice it would make tons of code (in particular loops) significantly slower to eliminate this UB.
That's an argument for why it shouldn't be two's complement, not for why it has to be fully undefined behavior. If you're going to make signed integers never overflow when used as a loop counter, what's wrong with documenting that and offering a warning in -Wall or -Wextra when it happens?
And it's nothing specifically to do with signed integer overflow. If you're removing code the programmer wrote or making conditional statements unconditional because it can only happen in the presence of UB, that's a huge red flag that there is a bug in that program and the compiler should not be silent about it.
> But as long as there are limits, the definition of undefined was never valid.
There are always limits. Your CPU is (for the most part) deterministic, and no amount of UB will change that (well, the nuclear missiles launched due to UB might...).
> It seems to me that the problem is that trying to define undefined behavior is an inherent contradiction.
Here is the definition of UB according to the C++ standard:
"This document imposes no requirements on the behavior of programs that contain undefined behavior."
Don't try to define or reason about the consequences of UB, that's pointless. Just don't provoke any undefined behavior and you get to live in the clearly defined world of the standard.
> Setting aside the question of what exactly "undefined behavior" means, why does a language spec have to include it? If there is behavior that cannot be defined, why not just omit it from the standard?
"X is UB" means "compiler writers may freely assume that X is not done". If you omit that then compilers would have to verify that X is not done, and there are requirements in the standard which would require the halting problem to be solved in order to verify them in user code. The standard likes to avoid forcing compilers to solve the halting problem.
"This document imposes no requirements on the behavior of programs that contain undefined behavior."
The NY Vehicle and Traffic law imposes no requirements on the behavior of drivers who engage in cannibalism. However, it would be odd to interpret this as meaning that if you commit cannibalism, you are exempt from all rules regarding motor vehicles.
There are clearly two kinds of "undefined" behavior - the kind that is defined as undefined, and the kind that is not. To understand either, you have to understand both.
> The NY Vehicle and Traffic law imposes no requirements on the behavior of drivers who engage in cannibalism.
Are you trying to argue that the standard quote is unclear? That you think it can be read "imposes no additional/special requirements" (because that's the interpretation that your traffic law argument assumes)? Because if you ignore the nonsensical meaning, I would read your traffic law sentence as "imposes no requirements whatsoever".
Regardless of what your stance is regarding possible ambiguity in the way that sentence is worded, both the intent and the practical consequences of that statement are abundantly clear: If your program has UB (per what the C++ standard considers UB), then the C++ standard makes absolutely no guarantees what will happen when you run it.
> There are clearly two kinds of "undefined" behavior - the kind that is defined as undefined, and the kind that is not. To understand either, you have to understand both.
I don't understand what you are trying to say. There is only one kind of undefined behavior. If you follow the rules of the C++ standard you get to live in a nice and predictable world. If you don't, anything can happen and you're on your own.