Hacker News new | ask | show | jobs
by tardedmeme 30 days ago
I told someone at a conference that UB actually means "implementation-defined, no documentation required". He started to refute me and then stopped.
2 comments

That isn't true, for UB the compiler is allowed to assume the UB can never happen. For example if you dereference a pointer and only after check if it is NULL, the compiler can remove the NULL check, since it is clearly impossible (nevermind that you might be on a microcontroller where NULL is a valid address).

The fallout of this are quite large! If behaviour is implementation defined the compiler has to stick to one consistent behaviour. No such need for UB, you can get different behaviour bu changing unrelated code, by changing between debug and release or just because of what garbage happened to be on the stack.

Since the compiler is allowed to assume the UB doesn't happen it will also sometimes look like the compiler miscompiled your code elsewhere, but what actually happened was some inlining followed by extrapolating "this can never happen".

UB is often surprising: I have seen unaligned loads crash on x86 due to it bring UB in C (even though x86 is generally fine with it). But once a newer compiler decided that it was fine to vectorise that code (since it clearly aligned) the CPU was no longer happy with it.

I think parent commenter made a joke. UB can be seen as "implementation defines this to reformat your hard drive. No we don't document it".

That is, the compiler de facto defines what happens when you compile UB code.

So you're not wrong, but I think you missed the sarcastic spin of parent.

>That is, the compiler de facto defines what happens when you compile UB code.

That is not what undefined behavior is though, that is unspecified behavior.

The entire point of undefined behavior is to cover the cases where the compiler can't define the semantics of your program either because doing so is genuinely not possible, or is incredibly onerous to deduce, or would require introducing runtime checks whose performance cost is at odds with C and C++'s predominant use cases.

Sorry, by "de facto defines" I meant that it factually does something, even if that "something" is "segfault the compiler at build time".

That "de facto" did some heavy lifting.

Except that UB doesn't mean that. UB means "the developer must never write this".
Both are wrong. It means "this standard does not constrain the behaviour of code that does this".

It's entirely legal for implementations to have predictable behaviour, documented or not, for code that is undefined by the standard. In their quest for maxxing benchmark performance they generally choose not to, but there's really nothing in any standard that stops you from making an implementation that prioritises safety.

Every implementation so far has predictable behavior in all cases. Sometimes the rules for predicting it are very obscure. But it's all fully defined within the compiler's binary code. And none of them link to nasal portals.
How do you propose to predict the behavior of a true race condition with only the binary, faithfully translated by the compiler?

Moreover, this is at best an incredibly pedantic point, not something that changes how programmers need to approach UB. You can't review the source code of a compiler that hasn't been written yet.

I didn't suggest that implementations should entirely eliminate every form of UB. There is plenty of middle ground. For example, you could easily limit the consequences of integer overflow by specifying or partially specifying overflow behaviour, with very little runtime cost.

I'm not suggesting you change how you write code, but with a better implementation the code that you do write - that lives in the real world where mistakes are made - might work better. How is that being pedantic?

An interesting case where compiler writers did something like that is casting via union members, but I'm running out of time, so we can talk about that another day.

It's fully defined by your CPU's silicon masks and your compiler's binary code that one of several things will happen.
Turns out that when you're implementing network applications, the set of things that could happen also depends on what the script kiddie on the other side of the globe feels like this morning.

Some would prefer less excitement than this.

C code should be more predictable and easier to reason about than using a macro assembler. To the extent it is not, the language has failed.