Hacker News new | ask | show | jobs
by pascal_cuoq 2250 days ago
One concrete reason why “unspecified” means “anything and not always the same thing” is to enable the maximum of optimizations.

Write a function c that compares pointers in a compilation unit, and in another compilation using, define:

    int a, b;
    X1 = (&a == &b + 1);
    X2 = c(&a, &b + 1);
The compiler can optimize the computation of X1 on the basis that comparing an offset of &a to an offset of &b will always:

  - be false
  - or invoke undefined behavior
  - or be unspecified
But the optimization will not apply to the computation of X2, so the two variables X1 and X2 can receive different values when you execute this example, although they appear to compute the same thing.
1 comments

I get why unspecified means that and it’s good to know what the limit is for applying an optimisation, but I was asking about why the specific comparison of “one past the end” with the beginning of another being unspecified would be useful. It’s cool you can optimise it out, but what does a compiler gain from being able to do that?

Imagine a standard stated that > and < character comparisons involving '%' were unspecified. Why would this be good? It wouldn’t, so it’s not in any standard. But specifically it wouldn’t because (a) nobody writes ch < '%', and (b) if they did, compilers couldn’t make programs any faster, more portable, etc, because of its inclusion.

I guessed above that this is kinda like having hashmaps iterate in a random order: compilers do spooky things when you try to check whether two allocas/mallocs are adjacent, so don’t do it. Is that accurate? Or does it mean that compilers can move things around on the stack if they want, without worrying about updating the registers or locations that store the pointers, i.e. this is mainly to make compilers easier to write? If it’s that, I imagine I would want some other pointer comparisons on the list. The reason it’s in there is what I wanted you to shed some light on.

Oh, that was your question. In this case, the reason why &a + 1 == &b is unspecified is that:

- it's generally false—there is no reason for b to be just after a in memory, so these two addresses compare different.

- it is sometimes true: when addresses are implemented as integers, and compilers use exactly sizeof(T) bytes to represent an object of type T, and do not waste precious integers by leaving gaps between objects, and == between pointers is implemented as the assembly instruction that compares integers, sometimes that instruction produces true for &a + 1 == &b, because b was placed just after a in memory.

In short, &a + 1 == &b was made unspecified so that compilers could implement pointer == by the integer equality instruction, and could place objects in memory without having to leave gaps between them. Anything more specific (such as “&a + 1 == &b is always false”) would have forced compilers to take additional measures against providing the wrong answer.