Hacker News new | ask | show | jobs
by jcranmer 2117 days ago
My gut instinct is as follows:

  void *x = malloc(...);
  void *y = malloc(...);
  assert(x != y); // standard guarantees this [1]
Yet it's fairly reasonable that:

  void *x = malloc(...);
  free(x);
  void *y = malloc(...); // malloc reused x's allocation here.
So, in effect, guaranteeing that the results of two mallocs can never alias each other, while allowing the implementation to reuse freed memory, requires semantically adjusting the value of a pointer to a unique, unaddressable value.

[1] I think, but I'm not sure which versions of C/C++ added this guarantee

3 comments

This would be kind of hilarious if true.

It seems like they could have just said: malloc won't give you a pointer that overlaps with the storage of any live malloc'd object. Such a malloc is implementable without too much trouble. But instead, they gave a stronger guarantee--that all malloc'd pointers would be "unique". It would be unboundedly burdensome on the implementation to meet this property, so what do they do? Update the standard to offer the achievable guarantee? No! They add a new rule, ensuring that it's impossible to observe that the stronger guarantee is not met without doing something "illegal". Instead of getting their act together, they have elected to punish whistleblowers.

On second thought, the choice they've made isn't as sadistic as it sounds. I was thinking of the standard as a contract between the language implementor and the programmer, but actually it is a contract between the language implementor, the programmer, and arbitrarily many other programmers. The stance they have chosen mandates a social convention, that the names of the dead will never be spoken. If everyone builds their APIs with this covenant in mind, it makes it possible to use pointers as unique ids (for whatever that's worth). C has never had much in the way of widely-followed social conventions, so practically speaking, the only way to ensure everyone knows they can depend on other programmers behaving this way is for the compiler to flagellate anyone who steps out of line.
I feel like you should not need to carve out extra language in the standard to explain this. It's very clear that following *x is undefined after free(x). It's also clear from any reasonable understanding of malloc that x's numeric value might collide with a later allocation coincidentally. Why should that make "if (x)" undefined?

It's true that the result of "if (x == y)" would depend on coincidences lining up and you should not rely on either one. Calling any evaluation of x "undefined" seems much more extreme than that though.

At first glance, it does some like an unnecessarily gratuitous instance of undefined behavior. However... what could you actually meaningfully do with a pointer to freed memory anyways? You clearly can't dereference it. The only pointers you can compare it to are other pointers to the now-freed object (cross-object pointer comparisons are UB in C) and NULL. But if you were going to compare it to NULL, it's presumably to guard against a dereference, so you'd end up UB anyways if you didn't overwrite it to NULL in the first place, at which point it's not being compared with anything anymore.
I can think of one use.

Let's say you have two pointers that are sometimes unique and sometimes aliases. Maybe they mean semantically different things but they happen to be the same for some cases, and different in others. They always are on the heap. You want to clean them up when your function exits, freeing them both, or once if they are not unique.

    free(p);
    if (p != q)
       free(q);
Believe it or not I have written something like this, although with integer file descriptors being closed rather than heap buffers freed. eg. Maybe some fds are passed for both reading or writing, or sometimes you have a unique fd for each, but it all needs to hit close(2).

To exist within the standard I guess you need to do the comparison first:

    if (p != q)
       free(p);
    free(q);
Edit: ah, but you just said "cross-object pointer comparisons are UB". I can't see a good reason for that either, but I do suppose it might make some architecture's non-linear pointer representation work better.
For some reason, I thought equality and inequality pointer comparisons required pointers to be in the same object. It's actually only the relational operators that are undefined if not in the same object. (Although I believe most compilers will treat them as unspecified instead of undefined).
Malloc can return NULL so there is no guarantee of unique return values.
NULL != NULL, so at least no two values will be equal, right?