Hacker News new | ask | show | jobs
by benj111 28 days ago
>It's partly the standards fault here - rather than saying "We don't know how vendors will implement this, so we shall leave it as implementation-defined", they say "We don't know how vendors will implement this, so we will leave it as undefined

I'd agree to a point. I still think it's unreasonable for compiler writers to get all lawyery about precise terminology. After all "implementation defined" could still be subject to the same lawyeriness (we implemented it, ergo we define it).

To me this is an issue of culture. We need to push back against the view that UB means anything can happen, therefore the compiler can do anything.

2 comments

But it's genuinely useful. In all seriousness, are you sure you aren't perhaps just using the wrong language? At this point UB and leveraging it for optimization are core parts of the most performant C implementations.

That said, I think there are many cases where compilers could make a better effort to link UB they're optimizing against to UB that appears in the code as originally authored and emit a diagnostic or even error out. But at least we've got ubsan and friends so it seems like things are within reason if not optimal.

>are you sure you aren't perhaps just using the wrong language

Well I think there is a tension here. C is the language for microcontrollers and the language for high performance.

In ye olden days both groups interests were aligned because speed in C was about working with the machine. Now the UB has been highjacked for speed, that microcontroller that I'm working on, where I know and int will overflow and rely on that is UB so may be optimised out, so I then have to think about what the compiler may do.

I wouldn't say C is the wrong language. I would say there are wrong compilers though.

> At this point UB and leveraging it for optimization are core parts of the most performant C implementations.

I am skeptical that NULL-pointer checks being removed contribute anything more than a rounding error in performance gains in any non-trivial program.

I got a measurable improvement from eliminating a null pointer check within the last week. Billions of devices have arm little cores, and the extra branch predictor pressure and frontend bandwidth from those instructions can be significant.

A standard way to eliminate those is to invoke undefined behavior if some condition is not met;

    if (a == NULL) {
      __builtin_unreachable();
    }
Which then allows elimination of the null check in later code, possibly after inlining some function.
This series was a good explanation for me of why treating UB this way is genuinely useful: https://blog.llvm.org/2011/05/what-every-c-programmer-should...

Being able to assume certain things don't happen is powerful when you're writing optimisations, not doing that would have a real performance cost

> Being able to assume certain things don't happen is powerful when you're writing optimisations, not doing that would have a real performance cost

A few of those are significant performance gains, the majority are not.

Emitting the instruction for a NULL pointer dereference is effectively no more costly than not emitting that instruction.

It's the code removal that's killing me.

What if the compiler is able to use that to determine that a whole code path is dead, and then significantly improve the surrounding function because of that?

Compilers optimise in multiple passes and removing things earlier can expose optimisation opportunities later that can affect other parts of the code too

> What if the compiler is able to use that to determine that a whole code path is dead,

Then it should warn "unreachable code".

> and then significantly improve the surrounding function because of that?

It's not simply the removal that is the problem, it's that the code is silently removed.

I don’t mean this in a rude way but you should really read the posts I linked, it’s interesting and part 3 especially answers these questions

Direct link to part 3 (but read the others first for context if you can): https://blog.llvm.org/2011/05/what-every-c-programmer-should...

You don’t want warnings for every piece of code in a library you’re not using or sanity check you added that isn’t supposed to be hit

And you can’t warn when you’re optimising based on undefined behaviour because you can’t know when it will happen, that is equivalent to the halting problem

If you warned whenever undefined behaviour could be happening then e.g. every single pointer deference would say “warning: compiling assuming pointer is not null or unaligned”

> I don’t mean this in a rude way but you should really read the posts I linked, it’s interesting and part 3 especially answers these questions

I have read the entire series. Both in the past and more recently.

> If you warned whenever undefined behaviour could be happening then e.g. every single pointer deference would say “warning: compiling assuming pointer is not null or unaligned”

I am not proposing that at all, I am proposing (which the series you link to does not preclude) that eliding code on the basis of UB can be determined by the compiler. If the compiler can determine that some code block needs to be elided, then the reason for that elision has already been determined, in which case it can issue a warning when "reason" == "pointer already used" when eliding the null check.

Right. But to take the first example, the value of initialised memory.

It's undefined so it doesn't have to be zeroed therefore increasing efficiency.

But it's also UB so if you do know that memory contains something, you can't take advantage of that because it's UB. Having it UB is fine. It's the compilers assuming UB can't happen and optimising it away.