Hacker News new | ask | show | jobs
by greenhouse_gas 2946 days ago
The real issue isn't that C doesn't have a standard int overflow, but that it's undefined.

What they could have done is made it implementation defined, like sizeof(int), which depends on the implementation (hardware) but on the other hand isn't undefined behavior (so on x86/amd4 sizeof(int) will always be equal to 4).

4 comments

It's undefined for a reason.

  size_t size = unreasonable large number;
  char buf = malloc (size);
  char *mid = buf + size / 2;
  int index = 0;
  for (size_t x = 0; x < big number; x++) mid[index++] = x;
A common optimization by a compiler is to introduce a temporary

  char *temp = mid + index;
prior to the loop and then replace the body of the loop with

  *(temp++) = x;
If the compiler has to worry about integer overflow, this optimization is not valid.

(I'm not a compiler engineer. Losing the optimization may be worth-while. Or maybe compilers have better ways of handling this nowadays. I'm just chiming in on why int overflow is intentionally undefined in the Fine Standard)

Are you sure this was the intent of the standard writers back in the midlate 80s and not something that modern compilers just happened to take advantage of? I'd really expect it to be the former.
Integer overflow is certainly not undefined for this reason.

It's undefined because in the majority of situations, it is the result of a bug, and the actual value (such as a wrapped value) is unexpected and causes a problem.

For instance, oh, the Y2038 problem with 32 bit time_t.

>It's undefined because in the majority of situations, it is the result of a bug,

1. If it's a bug, it should overflow or crash (implementation defined, not undefined), or do what Rust does, crash on -o0 (or, if it's illegal to change defined behavior based on optimization level, create a --crash-on-overflow flag) and overflow on everything else.

2. There is plenty of code where it's intentional (such as the infamous if(a+5<a)).

You meant

    char * buf = malloc(size);
You dropped an asterisk. Since changing pointers returned by malloc() is a bad idea, I'd make it:

    char * const buf = malloc(size);
This is only useful if buf is involved in some preprocessor macrology which perpetrates a hidden mutation of buf.

   BIG_MACRO(x, y, z, buf); // error!
the programmer is informed that, to his or her surprise, BIG_MACRO mutates buf and can take appropriate corrective action.

It's also useful in C++, since innocent-looking function calls can steal mutable references:

   cplusplusfun(x, y, z, buf); // error: arg 4 is non-const ref
No such thing in C, though; function calls are pure pass-by-value.

Changing pointers returned by malloc is sometimes done:

   if ((newptr = realloc(buf, newsize)) != 0)
     buf = newptr;
   else
     ...
In my experience, C code doesn't use const for anywhere near all of the local variables which could be so qualified.

If you enact a coding convention that all unchanged variables must be const, the programmers will just get used to a habit of removing the const whenever they find it convenient to introduce a mutation to a variable. "Oh, crap, error: x wasn't assigned anywhere before so it was const according to our coding convention. Must remove const, recompile; there we go!"

If you want to actually enforce such a convention of adding const, you need help from the compiler: a diagnostic like "foo.c: 123: variable x not mutated; suggest const qualifier".

I've never seen such a diagnostic; do you know of any compiler which has this?

I think that the average C module would spew reams of these diagnostics.

> If the compiler has to worry about integer overflow, this optimization is not valid.

I'm sure it's still possible to come up with an optimization that takes into account signed-ness, and doesn't give in to performance or code-size much.

size_t is unsigned, overflow is defined.
The type of index is, however, signed int.
You're right, I read diagonally :)

However, the optimization argument for signed overflow seems weird to me, because I can't see any reason why this argument would not apply to unsigned overflow as well.

If we keep undefined behavior to optimize things like "if (n < n + 1)" when n is signed, why not do the same when n is unsigned?

Conversely, if there is a good reason not to, then why would it not apply to signed overflow as well?

This case is not worth optimising, because the index should be size_t just like the original size. Then the compiler knows it won't overflow, and doesn't have to check.
And, the fix is easy: just use types of the same width for the counter and the boundary. Using a narrower counter is just begging for errors to happen. This is not a good coding style, and there is no point in having the compiler condoning it.

Compiling it and making it run? Sure. Bending over backwards to ensure it runs fast? Hell no.

Just a nitpick. Implementation is about the particular compiler and runtime (stdlib) implementation, not the hardware. Hardware is the platform hosting the implementation (this are ISO C-standard defined terms).

A compiler targeting x86 platform can implement sizeof int == 8, or whatever it pleases, as far as C std is concerned.

In practice compilers dont get creative about this. But there are real world cases where stuff is different, for example: http://www.unix.org/version2/whatsnew/lp64_wp.html

The modern case for keeping signed overflow as UB is that it unlocks compiler optimizations. For example, it allows compilers to assume that `x+1>x`.

If implementations are forced to define signed overflow, then these optimizations are necessarily lost. So implementation-defined is effectively the same as fully-defined.

I suppose the question is, which of these optimisations are actually useful for the compiler to do automatically? Yours is the example that's always thrown about, but it always seems like the kind of optimisation that the programmer should be responsible for.
> on x86/amd4 sizeof(int) will always be equal to 4

Nothing is stopping your C compiler from making the guarantee sizeof(int)=4 on x86/amd64.

I think you are in agreement with the comment you are replying to.
The comment suggested the standard make it implementation defined rather than undefined. There's not a meaningful difference here.

Even today, an implementation may define unsigned overflow.

Yes, there is. Implementation defined means that a conforming implementation _must_ document its behavior.

That means that programmers don’t have to use trial and error to figure out how the compiler behaves and don’t have to _hope_ they found all the corner cases.

And that is how we get #if defined(_THIS_THING_SOME_COMPILER_DEFINES) && !defined(__BUT_NOT_THIS_ONE_THAT_COMPILER_X_DEFINES) soup ;)
Better than than silently ignoring an if guard preventing an overflow, and then overflowing anyways on addition.
Oh, I see, I wonder if greenhouse_gas is suggesting a feature similar to sizeof() that can be used to portably adapt your program's design to the target's overflow capability.
C language lawyer in training: sizeof is not a function.

The parentheses are part of the operand and only needed for type names, to make them into cast expressions.