Hacker News new | ask | show | jobs
by tromp 1326 days ago
The bug in question is trying to compute an average as

    avg = (x + y) / 2
which fails both for signed ints (when adding positive x and y overflows maxint) and for unsigned ints (when x + y wraps around 0). Note that this can only be considered a bug for array indices x,y when these are 32 bit variables and the array can conceivably grow to more than 2 billion elements.

I wonder what is the simplest fix if the ordering between x and y is not known (e.g. in applications when x and y are not range bounds) and the language has no right-shift operation...

6 comments

M = L + (R - L)/2

looks fairly simple to me. note this works of any ordering of R and L if the data type is signed.

But doesn't this have the same overflow issue, e.g. if R is a large positive number and L is a large negative one?
In binary search and heapsort, neither L nor R are negative - they are the number of elements in the array.
Binary search can be done on anything, not just arrays. Often you apply it to an algorithm and there isn't a collection at all, you just know the right answer is between some numbers so binary search lets you find it in logarithmic number of tries. If computing the number is costly then binary search is necessary to compute the result at all in those cases.
Yes, if you want a solution to a related problem then the implementation will need a better midpoint calculation.

However, the linked-to binary search starts from 0:

  1:     public static int binarySearch(int[] a, int key) {
  2:         int low = 0;
  3:         int high = a.length - 1;
and the promoted fix is:

  6:             int mid = low + ((high - low) / 2);
The aforementioned Wikipedia binary sort also takes a length, rather than start/end:

  function binary_search(A, n, T) is
      L := 0
      R := n − 1
C++ has had a correct binary search in its standard library since c++98, and it works on pointers, integers etc, both signed and unsigned without overflows. I'm not sure why they say that this doesn't exist.

https://en.cppreference.com/w/cpp/algorithm/lower_bound

You can use unsigned division and addition and then go back to signed and it is still correct.
Only on 2s-complement.
Which is basically everything.
Even if the data type is unsigned it suffices as long as the R - L term is signed (so the division by 2 is an arithmetic shift, not a logical shift).
Guess it depends on how you define "simplest"?

x / 2 + y / 2 + ((x & 1) + (y & 1)) / 2

Or

    x / 2 + y / 2 + (x & y & 1)
Edit: This is the same you wrote, but it gets the wrong number for negative values, for negative ints the rounding will go up and not down.
And this is exactly why I like to use higher level programming languages. Let someone smart figure all this out for me, and give me (grug) a generic binary search routine that works on arbitrary collections of arbitrary ordered things.
Very much relevant talk about std::midpoint in C++:

https://youtu.be/sBtAGxBh-XI

In general (x + (y - x) / 2) is more general than (x + y) / 2. If x and y are not in some group, but rather in the torsor of some group, you can't really sum them. Any attempt to do so involves introducing some arbitrary reference point. You can always do this, but once you do, you're at risk of your calculation results depending on the choice of arbitrary reference point and hence being meaningless.

The difference of two elements of the torsor of some group G is an honest-to-God group element of G, though, and so you have an honest-to-God identity element. You may or may not have an honest-to-God division or halving operator (which computes e given (e + e)) but in cases where G is the additive group of some field you do.

However, in this case our array indices are drawn from something like ℤ/2³²ℤ, and we might be trying to halve odd numbers, so none of this is justifiable! We want something different from our halving operator.

https://math.ucr.edu/home/baez/torsors.html

I see dataflow and maxiepoo were already talking about this: https://news.ycombinator.com/item?id=33493149

> The difference of two elements of the torsor of some group G is an honest-to-God group element of G, though, and so you have an honest-to-God identity element. You may or may not have an honest-to-God division or halving operator (which computes e given (e + e)) but in cases where G is the additive group of some field you do.

… some field of characteristic ≠ 2, of course.

Right, in GF(2ⁿ), e + e is always 0. Thank you for the correction.
The simplest fix is obviously to use 64 bit ints and call it a day.

  (x^y)/2 + (x&y)