Hacker News new | ask | show | jobs
by BeeOnRope 2753 days ago
You can usually optimize integer arithmetic just fine, including the example you gave (both forms are equivalent - try it!).

Floating point arithmetic is different, but Java gives itself wiggle room by not exactly specifying many results unless you choose "strict math". That's not UB though: it's just a range of possible outcomes.

Java can't have UB in the C/C++ sense, since it would break the security sandbox. It certainly has things without specifically defined values, such as hashCode() and what happens under data races isn't entirely deterministic, but it doesn't approach UB in the C/C++ sense.

1 comments

Yeah I’m painfully aware of the FP gotchas. But are you saying there are usually never any issues with integer arithmetic and overflow vs. optimizations (reordering, common subexpressions etc)? A branch like “if a+1 < a” seems like it could under a clever compiler (allowed to do what it wants in unchecked overflow) optimize to a completely removed branch but with less optimization it will not, so the addition is carried out and the wraparound means the branch is entered?

Seems that not checking for overflow and not being able to assume there is no overflow, would give the worst of both worlds (slower because of lack of some optimizations but still not safe against overflow like C#’s “checked”).

I thought a deref of a possibly overflown value was what could risk security, ie so long as all array indices and similar are range checked then nothing bad can happen?

> But are you saying there are usually never any issues with integer arithmetic and overflow vs. optimizations (reordering, common subexpressions etc)?

I'm saying that it is uncommon. For example, the example you gave works fine! Most examples work fine since you can do most of the same type of transformations. The exceptions are largely operations where the upper bits affect the lower bits. Division is one, and right shift is another, but nearly all the other bitwise and math operations do not have this property.

The other place where signed-wrapping-is-UB is often used for optimization is in things like loop bounds. Given something like:

    for (int i = start; i < end; i++)
Due to the UB of signed wrapping, the compiler can assume that initially i < end, and more importantly that the loop will iterate exactly (end - start) times, and that all accesses will be to contiguous, increasing addresses. This helps in vectorization, among other things.

In Java, the compiler couldn't take advantage of that. However, the impact isn't as big since in any loop that accesses arrays based on the index, bounds checks have to be made, and a typical pattern is to do some up front bounds checks [1] which can guarantee than the main body of the loop can then be run without additional checks: and this subsumes the checks that would be needed due to wrapping signed values anyways. So basically you have to do those checks anyways in many cases.

> A branch like “if a+1 < a” seems like it could under a clever compiler

Sure, but these cases aren't very interesting for comparing the effectiveness of the signed overflow optimizations. It's a case where the optimization breaks the (wrongly expressed) intent of the programmer, so the difference is only between a fast, broken program and a slower, perhaps correct one.

Presumably in the signed-overflow-is-UB world, such a check will have to re-written in a different way (which might even end up slower).

> Seems that not checking for overflow and not being able to assume there is no overflow, would give the worst of both worlds (slower because of lack of some optimizations but still not safe against overflow like C#’s “checked”).

Not exactly - it's just a different point on the spectrum. On the one side you have overflow is UB, which leads to some (fairly limited) optimization opportunities, but also more unexpected results and broken programs, and on the far other size you have overflow is an always-caught checked error. Java is somewhere in the middle: overflow is well-defined (it wraps), which gets you most of the speed of the UB approach (no checking needed, only a few relatively minor optimizations are lost) without the unexpected results of UB - but you still have broken programs when overflow actually occurs (unless it was unexpected). See also Rust's strategy here which is interesting.

> I thought a deref of a possibly overflown value was what could risk security, ie so long as all array indices and similar are range checked then nothing bad can happen?

When "nothing bad can happen" from the point of view of the JVM, i.e., the security sandbox isn't broken, you can't access arbitrary memory, violate the type system, interfere with unrelated code, etc. Of course, overflowing an index and then accessing into the array could still do plenty of "bad things" depending on the higher level semantics of the program, since you are now executing an unexpected code path. You might return sensitive data to an attacker, whatever.

---

[1] More details: https://wiki.openjdk.java.net/display/HotSpot/RangeCheckElim...

Thanks that's very kind to take the time to write that much!

> leads to some (fairly limited) optimization opportunities,

Ok, I was just mistaken in my belief that integer overflow shenanigans was a major contributor to how a modern compiler optimized e.g. loops.

> you can't access arbitrary memory, violate the type system, interfere with unrelated code

Right. I was considering the sandbox in the sense only of process security rather than program/type soundness.

> Ok, I was just mistaken in my belief that integer overflow shenanigans was a major contributor to how a modern compiler optimized e.g. loops.

Yes, there is some impact but I think it's large, at least based on looking at a lot of assembly, and going over the typical examples of where it helps. In the cases it does help, it could make a big difference on a loop, let's say 2x the speed, but these aren't all that common.

> Right. I was considering the sandbox in the sense only of process security rather than program/type soundness.

Usually those two things end up tightly bound together: it's hard to impossible to enforce a sandbox if the user can escape the type system.