Hacker News new | ask | show | jobs
by friendlydude12 2924 days ago
Undefined behavior doesn’t exist to ease portability per se, it exists to handle meaningless yet syntactically valid constructs.

Shifting by a negative number is meaningless and is indicative of programming error. Instead of specifying that an implementation should check and handle these programming errors, which implies runtime costs, we use undefined behavior and leave the responsibility to the programmer.

The reason many compilers assume a positive shift after a shift operation executes is because otherwise the entire program would be undefined. You’re effectively asking for undefined programs to have defined behavior. The equivalent would be asking the compiler to correct your typos and it can’t do that. As they say: garbage in, garbage out. The compiler isn’t the one booby trapping you, you’re booby trapping yourself.

I think GP is right that what you want is more definition but it has to be in specific cases because otherwise the runtime costs would be too high. In your negative shift example, the definition would be “shifting by a negative number results in garbage output,” but note that that would burden implementations where negative shifts trap.

1 comments

> it exists to handle meaningless yet syntactically valid constructs.

Maybe more like "handle bad situations that cannot be precluded at compile time". And since C has "no runtime", so can't do much "handling" at run time, it gets UB instead.

It doesn’t need to be unable to be precluded at compile-time to be UB. The expression “1 << -3” can be precluded at compile-time, yet it compiles just fine: instead it invokes UB.

Doing something at runtime is a different concept from having a runtime.

In any case, C, in fact, often does have a runtime. What do you think executes main()?

Yes, 1<<-3 can be detected as UB at compile time when doing constant propagation. But that wasn't my point (you could declare this expression invalid, no need for UB). I was pointing out that UB is not about "meaningless yet syntactically valid constructs" (if there is such a thing at all; I won't argue). UB in general manifests at run time. So I was only proposing a reformulation.

> In any case, C, in fact, often does have a runtime. What do you think executes main()?

Sure, that's why I said it has "no runtime" (in quotes). I'm not trying to split hairs here.

But your formulation isn’t right because it’s not about the UB code being detectable at compile-time or run-time, it’s about describing how the UB code should behave regardless of when it can be analyzed.

In the context of real numbers sqrt(-1) is syntactically valid yet meaningless code. In other contexts, 1 << -3 is too. Also log(-1). Perfectly valid syntax but has no meaning.

The intent of scare-quotes tends to be ambiguous so I avoid them in discussion. http://frivolousquotationmarks.tumblr.com

You're right, they're often ambiguous, but often I don't know how to say it tersely in a different way. My assumption was that in that context the statement [C has "no runtime"] had clear meaning, but maybe I was wrong. The point was to make it obvious that I don't want that be taken too literally, and that I don't want to start a hair splitting discussion whether C has a runtime or not. Apparently it didn't accomplish that goal.

As to the formulation of UB, it's not about code ("syntactically valid constructs") but about manifestation. And manifestation is a run time thing. 1<<-3 "is" UB, yes. And that can be detected at compile time through constant propagation. But that's not why we need UB. We could just forbid 1<<-3, so there was no need for UB in the first place.

UB is really about values not known at compile time. The expression 1<<x might manifest in UB at run time, depending on the value of x. Since x in general is not known well enough to decide whether it will ever be negative, the question "will this expression ever manifest in UB at run time?" is in general undecidable, so the best answer is "possibly". That does not imply that it's always a bad idea to use the expression 1<<x, and it does not make the construct meaningless. (Really I don't think the construct itself has a meaning, but as I said I won't argue here).

And that's why I think the formulation "handle meaningless yet syntactically valid constructs" is misleading. It's not about the construct (i.e., syntactic construction). It's about run time values that lead to UB, and that are totally independent from the construct itself. They might come from a wrong function invocation, or from malicious user input. It is the programmer's duty to make sure, by whatever means, that UB never manifests.

Maybe that's just what you were trying to say, but I think it's important to be clear that it's not about the syntactic constructs.

I’m not saying that 1<<x is generally meaningless. I’m saying that whenever the shift operand is negative, it’s meaningless. Just like log(x) isn’t generally meaningless, it’s only meaningless when the argument is not positive.

Syntactic constructs in general should and do have meaning. The meaning isn’t inherent, but nothing has inherent meaning. log(x) has no inherent meaning, but when x is positive we have decided that it means that it evaluates to the value to which you would raise 10 to get x.

Bad run-time values only cause UB in the context of improper usage in syntactic constructs. They aren’t independent of syntactic constructs: both syntactic constructs and values dictate behavior.

You’re right that UB only manifests itself at run-time. And you’re right that we wouldn’t need to use UB for constructs that only manifest their behavior at compile-time. Indeed, there exists syntactically valid yet meaningless C++ template substitutions that don’t result in UB, they result in compilation errors.