Hacker News new | ask | show | jobs
by cjensen 2946 days ago
It's undefined for a reason.

  size_t size = unreasonable large number;
  char buf = malloc (size);
  char *mid = buf + size / 2;
  int index = 0;
  for (size_t x = 0; x < big number; x++) mid[index++] = x;
A common optimization by a compiler is to introduce a temporary

  char *temp = mid + index;
prior to the loop and then replace the body of the loop with

  *(temp++) = x;
If the compiler has to worry about integer overflow, this optimization is not valid.

(I'm not a compiler engineer. Losing the optimization may be worth-while. Or maybe compilers have better ways of handling this nowadays. I'm just chiming in on why int overflow is intentionally undefined in the Fine Standard)

6 comments

Are you sure this was the intent of the standard writers back in the midlate 80s and not something that modern compilers just happened to take advantage of? I'd really expect it to be the former.
Integer overflow is certainly not undefined for this reason.

It's undefined because in the majority of situations, it is the result of a bug, and the actual value (such as a wrapped value) is unexpected and causes a problem.

For instance, oh, the Y2038 problem with 32 bit time_t.

>It's undefined because in the majority of situations, it is the result of a bug,

1. If it's a bug, it should overflow or crash (implementation defined, not undefined), or do what Rust does, crash on -o0 (or, if it's illegal to change defined behavior based on optimization level, create a --crash-on-overflow flag) and overflow on everything else.

2. There is plenty of code where it's intentional (such as the infamous if(a+5<a)).

You meant

    char * buf = malloc(size);
You dropped an asterisk. Since changing pointers returned by malloc() is a bad idea, I'd make it:

    char * const buf = malloc(size);
This is only useful if buf is involved in some preprocessor macrology which perpetrates a hidden mutation of buf.

   BIG_MACRO(x, y, z, buf); // error!
the programmer is informed that, to his or her surprise, BIG_MACRO mutates buf and can take appropriate corrective action.

It's also useful in C++, since innocent-looking function calls can steal mutable references:

   cplusplusfun(x, y, z, buf); // error: arg 4 is non-const ref
No such thing in C, though; function calls are pure pass-by-value.

Changing pointers returned by malloc is sometimes done:

   if ((newptr = realloc(buf, newsize)) != 0)
     buf = newptr;
   else
     ...
In my experience, C code doesn't use const for anywhere near all of the local variables which could be so qualified.

If you enact a coding convention that all unchanged variables must be const, the programmers will just get used to a habit of removing the const whenever they find it convenient to introduce a mutation to a variable. "Oh, crap, error: x wasn't assigned anywhere before so it was const according to our coding convention. Must remove const, recompile; there we go!"

If you want to actually enforce such a convention of adding const, you need help from the compiler: a diagnostic like "foo.c: 123: variable x not mutated; suggest const qualifier".

I've never seen such a diagnostic; do you know of any compiler which has this?

I think that the average C module would spew reams of these diagnostics.

> If the compiler has to worry about integer overflow, this optimization is not valid.

I'm sure it's still possible to come up with an optimization that takes into account signed-ness, and doesn't give in to performance or code-size much.

size_t is unsigned, overflow is defined.
The type of index is, however, signed int.
You're right, I read diagonally :)

However, the optimization argument for signed overflow seems weird to me, because I can't see any reason why this argument would not apply to unsigned overflow as well.

If we keep undefined behavior to optimize things like "if (n < n + 1)" when n is signed, why not do the same when n is unsigned?

Conversely, if there is a good reason not to, then why would it not apply to signed overflow as well?

This case is not worth optimising, because the index should be size_t just like the original size. Then the compiler knows it won't overflow, and doesn't have to check.
And, the fix is easy: just use types of the same width for the counter and the boundary. Using a narrower counter is just begging for errors to happen. This is not a good coding style, and there is no point in having the compiler condoning it.

Compiling it and making it run? Sure. Bending over backwards to ensure it runs fast? Hell no.