| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by muvlon 28 days ago

Fair enough!

> And if it's not succeeded for 54 years, "try harder", or "just never make a mistake", is at least not the solution.

And I 100% agree. UB is way overused by these standards for how dangerous it is, and as a consequence using C (and C++) for anything nontrivial amounts to navigating a minefield.

2 comments

webstrand 27 days ago

I think as compilers got smarter, UB changed somewhat in meaning. Originally the compilers didn't perform such complex analysis, and while invoking UB could break your program, it would still do something reasonable.

marcosdumay 27 days ago

Yes, but compilers got smart enough for it to be a problem around 30 years ago, and we are still arguing about what to do.

pjmlp 27 days ago

You see a reasoning here, basically when all those C compiler benchmarks started, vendors moved from what Frank Allen described, to anything goes to win SPEC something benchmarks.

"Oh, it was quite a while ago. I kind of stopped when C came out. That was a big blow. We were making so much good progress on optimizations and transformations. We were getting rid of just one nice problem after another. When C came out, at one of the SIGPLAN compiler conferences, there was a debate between Steve Johnson from Bell Labs, who was supporting C, and one of our people, Bill Harrison, who was working on a project that I had at that time supporting automatic optimization...The nubbin of the debate was Steve's defense of not having to build optimizers anymore because the programmer would take care of it. That it was really a programmer's issue.... Seibel: Do you think C is a reasonable language if they had restricted its use to operating-system kernels? Allen: Oh, yeah. That would have been fine. And, in fact, you need to have something like that, something where experts can really fine-tune without big bottlenecks because those are key problems to solve. By 1960, we had a long list of amazing languages: Lisp, APL, Fortran, COBOL, Algol 60. These are higher-level than C. We have seriously regressed, since C developed. C has destroyed our ability to advance the state of the art in automatic optimization, automatic parallelization, automatic mapping of a high-level language to the machine. This is one of the reasons compilers are ... basically not taught much anymore in the colleges and universities."

-- Fran Allen interview, Excerpted from: Peter Seibel. Coders at Work: Reflections on the Craft of Programming

saagarjha 28 days ago

What should the behavior above be defined to do?

tsukikage 28 days ago

“Implementation defined behaviour”: compiler author chooses, and documents the choice.

A lot of UB should be implementation defined behaviour instead; this would much better match programmers’ intuitions as they reason about their code - you can even see it in the comments of this post: it’s always things like “this hardware supports / doesn’t support unaligned accesses”, it’s never nasal demons.

tardedmeme 28 days ago

I told someone at a conference that UB actually means "implementation-defined, no documentation required". He started to refute me and then stopped.

VorpalWay 27 days ago

That isn't true, for UB the compiler is allowed to assume the UB can never happen. For example if you dereference a pointer and only after check if it is NULL, the compiler can remove the NULL check, since it is clearly impossible (nevermind that you might be on a microcontroller where NULL is a valid address).

The fallout of this are quite large! If behaviour is implementation defined the compiler has to stick to one consistent behaviour. No such need for UB, you can get different behaviour bu changing unrelated code, by changing between debug and release or just because of what garbage happened to be on the stack.

Since the compiler is allowed to assume the UB doesn't happen it will also sometimes look like the compiler miscompiled your code elsewhere, but what actually happened was some inlining followed by extrapolating "this can never happen".

UB is often surprising: I have seen unaligned loads crash on x86 due to it bring UB in C (even though x86 is generally fine with it). But once a newer compiler decided that it was fine to vectorise that code (since it clearly aligned) the CPU was no longer happy with it.

thomashabets2 27 days ago

I think parent commenter made a joke. UB can be seen as "implementation defines this to reformat your hard drive. No we don't document it".

That is, the compiler de facto defines what happens when you compile UB code.

So you're not wrong, but I think you missed the sarcastic spin of parent.

Maxatar 27 days ago

>That is, the compiler de facto defines what happens when you compile UB code.

That is not what undefined behavior is though, that is unspecified behavior.

The entire point of undefined behavior is to cover the cases where the compiler can't define the semantics of your program either because doing so is genuinely not possible, or is incredibly onerous to deduce, or would require introducing runtime checks whose performance cost is at odds with C and C++'s predominant use cases.

marcosdumay 27 days ago

Except that UB doesn't mean that. UB means "the developer must never write this".

munch117 27 days ago

Both are wrong. It means "this standard does not constrain the behaviour of code that does this".

It's entirely legal for implementations to have predictable behaviour, documented or not, for code that is undefined by the standard. In their quest for maxxing benchmark performance they generally choose not to, but there's really nothing in any standard that stops you from making an implementation that prioritises safety.

tardedmeme 27 days ago

Every implementation so far has predictable behavior in all cases. Sometimes the rules for predicting it are very obscure. But it's all fully defined within the compiler's binary code. And none of them link to nasal portals.

Filligree 28 days ago

Print x twice. Not all “side effects” care about order.

Better yet, define an order for parameter evaluation.

HelloNurse 27 days ago

There is an easy way to take control: read the volatile variable only once.

  volatile int x = 5;
  ...
  int y=x;
  printf("%d in hex is 0x%x.\n", y, y);

poppadom1982 28 days ago

You're missing the point. Volatile forces two loads of a value that may have changed in the middle. So the value of "x" may depend on the time/order of load.

AnimalMuppet 28 days ago

Which is, if I understand correctly, the entire point of volatile. Don't use it if you don't want that behavior.

And in fact, in the example given, if there is something (another thread or whatever) that can change the value of x, then you don't know what either number will be. Well, in that circumstance, without volatile, it may print the same number both times, but you still don't know what the number will be (unless the read gets optimized away entirely).

chuckadams 28 days ago

If that behavior is the entire point, then I think the bigger point is that the spec should reflect that and not call it undefined.

voakbasda 28 days ago

I suspect that many undefined behaviors reflect the inability of the standard committee to come to a consensus on the nuances involved. “Punt to the implementers” is a way to allow every tool vendor to select their own expected behavior in those cases.

hmry 28 days ago

Why is that missing the point? Loading it twice, possibly with different values, is the intended behavior. It's only undefined because the C spec doesn't specify the order of the loads (unlike most other languages which have a perfectly well-defined order for side effects in a single expression).

rowanG077 28 days ago

What you are describing is implementation defined behavior. Using that is perfectly safe and reasonable. Undefined means this programs is malformed.

hmry 27 days ago

No I'm just repeating what the original comment said, which is that it's explicitly UB:

"5.1.2.4.1 says any volatile access - including just reading it - is a side effect. 6.5.1.2 says that unsequenced side effects on the same scalar object (in this case, x) are UB. 6.5.3.3.8 tells us that the evaluations of function arguments are indeterminately sequenced w.r.t. each other."

If function arguments were sequenced with respect to each other, it wouldn't be a problem.

But actually, maybe the original comment is wrong. Presumably "indeterminately sequenced" and "unsequenced" mean different things, although I don't have a copy of the standard at hand to check.

echoangle 28 days ago

Couldn’t you just define that function arguments are evaluated left to right?

Or just throw an error.

jfoks 28 days ago

Why? Just for this edge case? It could be faster and/or allow smaller code size to allow this to be undefined.

Undefined is also different from "depends on the compiler", because which behavior is chosen can even depend on the circumstances, whatever code appears before and/or after it.

That said, UB in code, such as this example of ordering of reads of volatile parameters being undefined, does not automatically mean that code that uses it is bad. It may very well be that the function being called doesn't misbehave either way.

echoangle 27 days ago

That’s the point of the whole article. It’s not worth the speed gain to have a language that nobody can safely use because you can’t really prevent UB when you write it.

> It may very well be that the function being called doesn't misbehave either way.

The function being good or bad has nothing to do with the UB. The UB occurs before the function is called.

saagarjha 28 days ago

I meant reading the uninitialized variable

muvlon 28 days ago

There is no uninitialized variable, I explicitly initialized it to 5.

And yes indeed, C could do what Rust does and define the order of evaluation for function arguments.

If the argument expressions are indeed side-effect-less, the compiler can always make use of the "as-if" rule and legally reorder the computation anyway, for example to alleviate register pressure.

lll-o-lll 28 days ago

HCF

saagarjha 28 days ago

I have good news about what UB allows

JadeNB 28 days ago

What is that?

FabHK 28 days ago

A fictitious assembly instruction (and pretty good TV series).

https://en.wikipedia.org/wiki/Halt_and_Catch_Fire_(computing...

SAI_Peregrinus 27 days ago

Halt and Catch Fire

jeffffff 28 days ago

Compilation error

saagarjha 28 days ago

It’s hard to detect all UB at compile time

Demiurge 28 days ago

It’s harder depending on the language, which is clearly the point.