| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tromp 1326 days ago

The bug in question is trying to compute an average as

    avg = (x + y) / 2

which fails both for signed ints (when adding positive x and y overflows maxint) and for unsigned ints (when x + y wraps around 0). Note that this can only be considered a bug for array indices x,y when these are 32 bit variables and the array can conceivably grow to more than 2 billion elements.

I wonder what is the simplest fix if the ordering between x and y is not known (e.g. in applications when x and y are not range bounds) and the language has no right-shift operation...

6 comments

froh 1326 days ago

M = L + (R - L)/2

looks fairly simple to me. note this works of any ordering of R and L if the data type is signed.

mqus 1326 days ago

But doesn't this have the same overflow issue, e.g. if R is a large positive number and L is a large negative one?

eesmith 1326 days ago

In binary search and heapsort, neither L nor R are negative - they are the number of elements in the array.

Jensson 1326 days ago

Binary search can be done on anything, not just arrays. Often you apply it to an algorithm and there isn't a collection at all, you just know the right answer is between some numbers so binary search lets you find it in logarithmic number of tries. If computing the number is costly then binary search is necessary to compute the result at all in those cases.

eesmith 1326 days ago

Yes, if you want a solution to a related problem then the implementation will need a better midpoint calculation.

However, the linked-to binary search starts from 0:

  1:     public static int binarySearch(int[] a, int key) {
  2:         int low = 0;
  3:         int high = a.length - 1;

and the promoted fix is:

  6:             int mid = low + ((high - low) / 2);

The aforementioned Wikipedia binary sort also takes a length, rather than start/end:

  function binary_search(A, n, T) is
      L := 0
      R := n − 1

Jensson 1326 days ago

C++ has had a correct binary search in its standard library since c++98, and it works on pointers, integers etc, both signed and unsigned without overflows. I'm not sure why they say that this doesn't exist.

https://en.cppreference.com/w/cpp/algorithm/lower_bound

Jensson 1326 days ago

You can use unsigned division and addition and then go back to signed and it is still correct.

kevin_thibedeau 1325 days ago

Only on 2s-complement.

1over137 1325 days ago

Which is basically everything.

pclmulqdq 1326 days ago

Even if the data type is unsigned it suffices as long as the R - L term is signed (so the division by 2 is an arithmetic shift, not a logical shift).

dukoid 1326 days ago

Guess it depends on how you define "simplest"?

x / 2 + y / 2 + ((x & 1) + (y & 1)) / 2

Jensson 1326 days ago

Or

    x / 2 + y / 2 + (x & y & 1)

Edit: This is the same you wrote, but it gets the wrong number for negative values, for negative ints the rounding will go up and not down.

nerdponx 1326 days ago

And this is exactly why I like to use higher level programming languages. Let someone smart figure all this out for me, and give me (grug) a generic binary search routine that works on arbitrary collections of arbitrary ordered things.

leni536 1326 days ago

Very much relevant talk about std::midpoint in C++:

https://youtu.be/sBtAGxBh-XI

kragen 1325 days ago

In general (x + (y - x) / 2) is more general than (x + y) / 2. If x and y are not in some group, but rather in the torsor of some group, you can't really sum them. Any attempt to do so involves introducing some arbitrary reference point. You can always do this, but once you do, you're at risk of your calculation results depending on the choice of arbitrary reference point and hence being meaningless.

The difference of two elements of the torsor of some group G is an honest-to-God group element of G, though, and so you have an honest-to-God identity element. You may or may not have an honest-to-God division or halving operator (which computes e given (e + e)) but in cases where G is the additive group of some field you do.

However, in this case our array indices are drawn from something like ℤ/2³²ℤ, and we might be trying to halve odd numbers, so none of this is justifiable! We want something different from our halving operator.

https://math.ucr.edu/home/baez/torsors.html

I see dataflow and maxiepoo were already talking about this: https://news.ycombinator.com/item?id=33493149

JadeNB 1325 days ago

> The difference of two elements of the torsor of some group G is an honest-to-God group element of G, though, and so you have an honest-to-God identity element. You may or may not have an honest-to-God division or halving operator (which computes e given (e + e)) but in cases where G is the additive group of some field you do.

… some field of characteristic ≠ 2, of course.

kragen 1325 days ago

Right, in GF(2ⁿ), e + e is always 0. Thank you for the correction.

im3w1l 1325 days ago

The simplest fix is obviously to use 64 bit ints and call it a day.

nwallin 1325 days ago

  (x^y)/2 + (x&y)