Hacker News new | ask | show | jobs
by pornel 690 days ago
That's a common misconception. Advanced compilers like GCC and LLVM have optimizers working primarily on a common intermediate representation of the program, which is largely backend-independent, and written for semantics of the C abstract machine.

UB has such inexplicable and hard to explain side effects, because all the implicit assumptions in the optimization passes, and symbolic execution used to simplify expressions follow semantics of the C spec, not the details of the specific target hardware.

2 comments

Programmers have an ideal obvious translation of C programs to machine instructions in their head, but there's no spec for that.

It creates impossible expectations for the compilers. My recent favorite paradox is: in clang's optimizer comparing addresses of two variables always gives false, because they're obviously separate entities (the hardcoded assumption allows optimizing out the comparison, and doesn't stop variables from being in registers).

But then clang is expected to avoid redundant memcpys and remove useless copies, so the same memory location can be reused instead of copying, and then two variables on the stack can end up having the same address, contradicting the previous hardcoded result. You get a different result of the same expression depending on order of optimizations. Clang won't remove this paradox, because there are programs and benchmarks that rely on both of these.

And yet, in practice the C have written over the last 20+ years mostly ended up with "undefined behaviour" = "what CPU does"...

That may not be true in fact but it ends up that way because "undefined behaviour" tends to be implemented "to work" and we're used to behaviour of common CPUs that may have fed into expectation of what "to work" should be...