| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Gibbon1 1325 days ago
	> And the reason it hasn't been "fixed" is that compilers can optimize code better if they can assume that signed addition won't overflow. Everyone says that and they have no proof.

1 comments

wizeman 1325 days ago

> Everyone says that and they have no proof.

Here's an entire blog post with proof:

https://kristerw.blogspot.com/2016/02/how-undefined-signed-o...

link

a1369209993 1325 days ago

None of that requires or even supports undefined overflow.

> Signed integer expression simplification [...]

This is easily addressed by allowing (but not requiring) implementations to keep wider intermediate results on signed overflow, to a unspecified width at least as wide as the operands.

> the index calculations are typically done using 32-bit int

No, they're done using size_t; that's what size_t is for.

> undefined overflow ensures that a[i] and a[i+1] are adjacent.

No, the fact that i (of type size_t) < length of a <= size of a in bytes <= SIZE_MAX ensures that a[i] and a[i+1] are adjacent.

(Also, note that this does not mean the optimizer is allowed to access memory at &a[i+1], since that might be across a page boundary (or even the wraparound from address ...FFF to 0), which makes adjacency less helpful than one might hope.)

> Value range calculations [...] Loop analysis and optimization [...]

Allowed but not required to retain excess bits on signed overflow.

link

wizeman 1325 days ago

> > Signed integer expression simplification [...]

> This is easily addressed by allowing (but not requiring) implementations to keep wider intermediate results on signed overflow, to a unspecified width at least as wide as the operands.

> > Value range calculations [...] Loop analysis and optimization [...]

> Allowed but not required to retain excess bits on signed overflow.

What does that even mean? That the code would have different behavior with a different compiler, different optimizations or when you slightly change it (depending on whether the compiler chooses to keep wider intermediate results or not)?

If understand you correctly, then depending on the values of x and y, `(x+1)<(y+3)` would have a different result than `(x+a)<(y+b)` when a=1 and b=3, because in the first case you would be simplifying the expression but in the second case you couldn't.

That would be quite surprising, to say the least.

> No, they're done using size_t; that's what size_t is for.

  for (int i = 0; i < 256; i++)
      a[i] = b[i] + c[i];

So you've never seen code like this? Not all arrays are huge and for small indices people usually go for `int`.

Also, when doing arithmetic with indices, you might need to represent a negative index, so `size_t` wouldn't work. You'd need to use `ssize_t`, which is a signed integer which would benefit from these optimizations.

But even then you might use an `int` if you know the arithmetic result will fit.

> No, the fact that i (of type size_t) < length of a <= size of a in bytes <= SIZE_MAX ensures that a[i] and a[i+1] are adjacent.

Not if `i` is a signed integer, say, `int` or `int8_t`. Which is the point of the optimization.

link

a1369209993 1325 days ago

> depending on the values of x and y, `(x+1)<(y+3)` would have a different result than `(x+a)<(y+b)` when a=1 and b=3

If x > TYPEOFX_MAX or y > TYPEOFY_MAX-2, then this can already happen with the more (maximally) vague "signed overflow is completely undefined" policy; wider intermediates just mean that code is not allowed to do other things, like crash or make demons fly out of your nose.

If x+[a or 1] and y+[3 or b] don't overflow, then computing them in a wider signed integer type has no effect, since the same values are produced as in the original type.

More generally, retained overflow bits / wider intermediates (instead of undefined behaviour) mean that when you would have gotten undefined behaviour due to integer overflow, you instead get a partially-undefined value with a smaller blast radius (and hence less opprotunity for a technically-standards-conformant compiler to insert security vulnerabilities or other insidious bugs). In cases where you would not have gotten undefined behaviour, there is no signed integer overflow, so the values you get are not partially-undefined, and work the same way as in the signed-overflow-is-undefined-behaviour model. ... you know what; table:

  signed overflow is \  no overflow     unsigned overflow  signed overflow
  undefined behaviour:  correct result  truncated mod 2^n  your sinus cavity is a demon roost
  wide intermediates:   correct result  truncated mod 2^n  one of a predictable few probably-incorrect results

link

wizeman 1325 days ago

> If x > TYPEOFX_MAX or y > TYPEOFY_MAX-2, then this can already happen with the more (maximally) vague "signed overflow is completely undefined" policy; wider intermediates just mean that code is not allowed to do other things, like crash or make demons fly out of your nose.

Yes, but saying "signed overflow is completely undefined" simply means that you are not allowed to do that, so this is a very well-defined policy and as an experienced programmer you know what to expect and what code patterns to avoid (hopefully).

If you say "signed overflow is allowed" but then your code behaves nondeterministically (i.e. giving different results when signed overflow happens, depending on which compiler and optimization level you're using or exact code you've written or slightly changed), I would argue that would actually be more surprising for an experienced programmer, not less!

It would make such signed overflow bugs even harder to detect and fix! As it would work just fine for some cases (or when certain optimizations are applied or not, or when you use a certain compiler version or Linux distro) but then it would completely break in a slightly different configuration or if you slightly changed the code.

And it would prevent tools like UBSan from working to detect such bugs because some code would actually be correct and rely on the signed overflow behavior that you've defined, so you couldn't just warn the programmer that a signed overflow happened, as that would generate a bunch of false alarms (especially when such signed-overflow-relying code was part of widely used libraries).

> More generally, retained overflow bits / wider intermediates (instead of undefined behaviour) mean that when you would have gotten undefined behaviour due to integer overflow, you instead get a partially-undefined value with a smaller blast radius (and hence less opprotunity for a technically-standards-conformant compiler to insert security vulnerabilities or other insidious bugs).

C compilers are already allowed to do what you say, currently. But I'm not sure that relying on that behavior would be a good idea :)

I think it's preferable that the C standard says that you are not allowed to overflow signed integers, because otherwise the subtlety of what happens on signed overflow would be lost on most programmers and it would be very hard to catch such bugs, especially due to code behaving differently on slightly different configurations (hello, heisenbugs!) and also because bug-detection tools couldn't flag signed overflows as invalid anymore.

link

a1369209993 1325 days ago

> saying "signed overflow is completely undefined" simply means that you are not allowed to do that

No, saying "signed overflow is a compile-time error" means you're not allowed to do that. Saying "signed overflow is completely undefined" means you are allowed to do that, but it will blow up in your face (or, more likely, your users' faces) with no warning, potentially long after the original erroneous code change that introduced it.

> As it would work just fine for some cases (or when certain optimizations are applied or not, or when you use a certain compiler version or Linux distro) but then it would completely break in a slightly different configuration or if you slightly changed the code.

That sentence is literally indistinguishable from a verbatim quote about the problems with undefined behaviour. Narrowing the scope of possible consequences of signed overflow from "anything whatsoever" to "the kind of problems that are kind of reasonable to have as a result of optimization" is a strict improvement.

> And it would prevent tools like UBSan from working

> bug-detection tools couldn't flag signed overflows as invalid

The standard says `if(x = 5)` is well-defined, but that doesn't stop every competently-designed (non-minimal, correctly configured[0]) compiler from spitting out something to the effect of "error: assignment used as truth value".

0: Arguably a fully competently designed compiler would require you to actually ask for -Wno-error rather than having it be the default, but backward compatibility prevents changing that after the fact, and it would require configuring -Wall-v1.2.3 (so build scripts didn't break) anyway.

link

Gibbon1 1325 days ago

Blog post has theory but no performance measurements.

link

wizeman 1325 days ago

Well, then you have moved the goal post, because initially you didn't ask for performance measurements, you asked for proof that compilers can optimize code better if they can assume signed arithmetic won't overflow.

And the blog post lists dozens of optimizations which can indeed be performed due to that decision.

The benefits of these optimizations will vary depending on the actual code and on the architecture/CPU, of course.

It will be greater for hot and tight loops that benefit from these optimizations and less important for cold parts of the code.

It will also be greater for more limited CPUs and less important for more sophisticated ones.

link