| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Dylan16807 1172 days ago

> C23 furthermore gives the compiler license to use an unreachable annotation on one code path to justify removing, without notice or warning, an entirely different code path that is not marked unreachable: see the discussion of puts() in Example 1 on page 316 of N3054.9

I don't agree with that description at all. Here's the code:

  1 if (argc <= 2)
  2   unreachable();
  3 else
  4   return printf("%s: we see %s", argv[0], argv[1]);
  5 return puts("this should never be reached");

The only code path that's "entirely different" is lines 1,4,5 and in that case of course you remove a return that's after a return.

And the other valid code path is 1,2,5, which has `puts` after `unreachable`.

To need `puts` you have to imagine a code path that gets past the "if" without taking either branch?

Maybe the author means something by "code path" that's very different from how I interpret it?

I would be pretty surprised if the above code means something different from:

  if (argc <= 2) {
    unreachable();
    return puts("this should never be reached");
  } else {
    return printf("%s: we see %s", argv[0], argv[1]);
    return puts("this should never be reached");
  }

6 comments

ternaryoperator 1172 days ago

This reminds me of a point made by the late Stan Kelly-Bootle, who for years wrote the Devil's Advocate column in UNIX Review magazine. In the early 1990s, he was discussing Microsoft's new C compiler and noted that in the promo material for the new compiler, it showed a benchmark for a loop that counted from 1 to 10,000 then printed "Hello". MS claimed that without optimization it took a few milliseconds, after optimization: 0 ms. A small asterisk explained the optimizer simply removed the loop. Kelly-Bootle pointed out, that the only reason a developer would write such a loop was to introduce a needed delay. Therefore, deleting the loop was not optimizing, but in fact pessimizing. And so, it was in fact Microsoft's Pessimizing C compiler.

codeflo 1171 days ago

Of course, that's technically incorrect. The way the standards are written, the compiler is free to replace the program with any other program that has the same (in a precisely defined sense) observable behavior (these are the famous "as if" formulations in language specs). Heating up the CPU is not considered observable behavior.

If someone really just wants a delay, it's easy to either (for programs running on normal OSs) call a sleep function, or (on tiny embedded systems) add an empty inline assembler statement that the compiler can't see through.

carlmr 1171 days ago

>Heating up the CPU is not considered observable behavior.

Neither is measuring delays of cached versus non-cached instructions. Yet it turns out to be very observable.

codeflo 1171 days ago

Of course these things are “observable” in the literal sense. And yet, they aren’t considered to be observable by the memory model of any language spec that I know of. Same as CPU power draw, which has been used as a side-channel to extract bits of crypto keys, and is very much influenced by common optimizations.

Practically, if you need to execute a specific sequence of machine instructions in order to prevent side-channel attacks, then you have to rely on assembler, compiler intrinsics and/or OS support. But that was true way before Spectre.

hyperhopper 1171 days ago

This is not true at all:

I've been many loops that turn into no-ops because all the functionality has been refactored out but this fact is hidden in function calls.

Sure, this should ideally be surfaced as a lint error, not a compiler optimization, but you cannot say that intentional delays are the "only" reason.

Also since processing time is variable, using that as a method should be extremely heavily discouraged/warned/require-opt-in

viraptor 1171 days ago

Those delay loops are common on microcontrollers and the usual solution is to either make the counter volatile or insert something opaque to the compiler in the loop body.

It would be of course nice if a warning was produced for that specific case: This whole loop was removed - is it really what you wanted, or is it a broken delay loop?

kzrdude 1171 days ago

I think it's a practical example of how the C language has made a journey to being more high abstraction than it used to be, in practice. And how that unsettles those used to the old behaviour.

wahern 1172 days ago

I think the point is that if the `argc <= 2` path is unreachable, then that means argc is always greater than 2, permitting the compiler to optimize the entire block to just:

  return printf("%s: we see %s", argv[0], argv[1]);

IOW, the conditional has been elided. But you're right in that the wording of the complaint doesn't match the example. The author presumably had in mind some of the more infamous NULL pointer-related optimizations, without spending the time to put together a properly analogous example.

dtolnay 1172 days ago

I interpreted the author's characterization to be about something like:

  1  if (argc <= 2)
  2    puts("A");
  3  puts("B");
  4  if (argc <= 2)
  5    unreachable();
  6  else
  7    return puts("C");
  8  return puts("D");

in which not just lines 4-6,8 go away (as you said) but also lines 1-2.

It makes sense to me but I can see why the author would characterize this situation as "license to use an unreachable annotation on one code path to justify removing an entirely different code path that is not marked unreachable". In a different world one might expect A to be printed "before the UB happens".

masklinn 1171 days ago

On the other hand, that has been the behaviour of optimising compilers in the face of UBs for years at this point, decades maybe. The linux kernel was hit by a deref' constraint propagation back in 2009 or so.

This is a behaviour I would absolutely expect from the construct, I would even qualify it as "the point".

rrobukef 1170 days ago

I find this especially surprising because line 2 may be exactly the reason why line 5 is unreachable. E.g. if puts("A") contractually throws an exception you cannot just remove it.

What am I missing in this example?

zefix 1170 days ago

C does not have exceptions...

rrobukef 1168 days ago

But it has long_jmp, right?

alwaysbeconsing 1171 days ago

One way to look at it (and I am not sure if this is correct, but it may be what the essay author meant) is to not treat the `unreachable` as affecting the presence of the decision, but only the result of the decision. If `unreachable` was replaced by a normal statement, we'd have:

    if (argc <= 2)
        do_something();
    else
        return printf("%s: we see %s", argv[0], argv[1]);

So the `return printf` is executed when `argc` is greater than 2. If we remove just the body of the first branch:

    if (argc <= 2)
        ;
    else
        return printf("%s: we see %s", argv[0], argv[1]);

the same thing holds. And additionally when `argc <= 2`, control will move past the `if`.

Under this view, if the `unreachable` won't cause the entire removal of the `if`, the compiler will produce the equivalent of:

    if (argc > 2)
        return printf("%s: we see %s", argv[0], argv[1]);

    return puts("this should never be reached")

Again, I don't say this is the correct interpretation, but it is one possibility, that would have to be ruled out by other parts of the standard.

Dylan16807 1171 days ago

I understand that interpretation, but that's what the end of my comment is about. If we treat unreachable as affecting the block it's in, but pretend it's not there for control flow, then the two versions of the code do different things. That's confusing and hard to preserve.

Asooka 1171 days ago

This just shows that "unreachable" is almost impossible to use safely. The only safe use of unreachable is if it is immediately after an instruction that makes the program stop running. It is not for "this cannot happen", because things that "cannot happen" happen all the time. If you use "unreachable", you're just asking for trouble and it seems the compiler authors are happy to oblige.

josephcsible 1171 days ago

This couldn't be more wrong. What you say to never use unreachable for is one of the most important use cases of unreachable. The whole point is to give the optimizer an assumption that it can't figure out on its own.

lldb 1171 days ago

One example of it being useful is unchecked std::variant access in c++ - there isn’t any api to access it like a union (if you already know the type) but you can mark the wrong type path unreachable to the same effect.

cryptonector 1172 days ago

There's no problem with this feature. I don't understand TFA's problem with it. As a programmer I get to not use `unreachable()` if I don't want to, and if I do I'm happy that the compiler takes my word for it and does the right thing. This is not at all like code elision in UB cases.

The `realloc()` change though...

_8j50 1172 days ago

Shouldn't the compiler warn or error on unreachable code?

codeflo 1171 days ago

This is not about code that's found to be unreachable through static analysis (where compilers might warn), but about a manual programmer annotation that claims the code is dynamically unreachable even though statically it might look otherwise.

benj111 1171 days ago

Why would you want that?

Is it to aid building for multiple targets? For debug builds?

flohofwoe 1171 days ago

Unreachable is mainly used as an optimization hint. For instance if you put an unreachable into the default branch of a continuous and non-exhaustive (from the pov of the compiler) switch-case statement, the compiler will not emit a range check for the jump table lookup.

ufo 1171 days ago

It helps optimization. One example is if you have code like this:

    if(condition) {
       error_stuff()
       abort();
    }
    normal_stuff();

If the compiler doesn't know that abort exits the program, they have to compile the normal_stuff path under the assumption that the error path might have run before it. This might result in suboptimal code.

Currently, many compilers support annotations such as __attribute__(noreturn) and __builtin_unreachable() to manually indicate that a code path is unreachable. C23 is now standardizing these features (with a slight tweak to the syntax).

_0ffh 1171 days ago

You can for example use it to give hints to the compiler that allows for optimisations, that it couldn't do otherwise.

Described e.g. here https://web.archive.org/web/20160508051118/http://blog.regeh...

Github https://github.com/preames/llvm-assume-hack

masklinn 1171 days ago

> Why would you want that?

To aid with optimisation, it basically lets you ask the compiler to remove branches, and provide constraints to the same.

An implementation might trap in debug code, but given no context would be provided you'd likely avoid this and would instead use your own wrapper macro to output a message of some sort in that case.

properparity 1171 days ago

But why put in unreachable? Doesn't make any sense to me.

If a branch is truly not supposed to ever happen, why have a branch at all? Just remove that code from the source entirely- that helps the optimizer even more, because the most optimal code is of course no code at all.

masklinn 1171 days ago

> But why put in unreachable? Doesn't make any sense to me.

Because sometimes you don't have a choice e.g. say you have a switch/case, if you don't do anything and none of the cases match, then it's equivalent to having an empty `default`. But you may want a `default: unreachable()` instead, to tell the compiler that it needs no fallback.

> If a branch is truly not supposed to ever happen, why have a branch at all? Just remove that code from the source entirely- that helps the optimizer even more, because the most optimal code is of course no code at all.

Except the compiler may compile code with the assumption that it needs to handle edge cases you "know" are not valid. By providing these branches-which-are-not, you're giving the compiler more data to work with. That extra data might turn out to be useless, but it might not.

benj111 1171 days ago

But this example isn't adding a constraint. The if statement is getting optimised away???

masklinn 1171 days ago

It is adding a constraint. The constraint is that argc can’t be smaller than 2. This is a literal “can’t”, as far as the compiler is concerned it’s a logical impossibility.

The branch containing the unreachable() obviously gets removed but the compiler then propagates the constraint (the condition for that illegal branch), and can prune any other path where `argc <= 2` upstream and downstream, as they are dead code per the constraint.