Hacker News new | ask | show | jobs
by rustybolt 1561 days ago
I agree that binary search is prone to errors, but the overflow error is just calculating the midpoint as

    (low + high) / 2
I mean, that's technically true because it might crash for huge arrays, but if this is a bug then the follow implementation of a function that adds to integers is also buggy, since it will overflow when x + y doesn't fit in an int:

    int add(int x, int y) { return x + y; }
6 comments

An important difference between your add method and binary search is that the signature of your add method already implies the contract that the sum of the two integers must fit into an int because there simply is no correct value of the specified return type that could be returned otherwise.

There's nothing about the signature of an ordinary binary search method that would imply that it only works for arrays that have less than MAX_INT/2 elements.

True, but you can make the same case for the addition operator which might have a type annotation somewhere, but it's certainly not visible to most programmers.

Simply put, in many languages there is no addition operator which does mathematically correct addition and that is a sad state of affairs.

This is a modern phenomenon.

In the first few decades of electronic computers and programming languages, having an addition or any other arithmetic operation that would not signal correctly the overflow exceptions would have been considered as completely unacceptable.

Computers without the right hardware implementation appeared initially among the so-called minicomputers and microcomputers, i.e. the cheapest computers, which initially were intended for things like industrial control or peripherals for larger computers, where it was supposed that competent programmers will take care to use appropriate workarounds for the hardware limitations.

Unfortunately, this kind of implementation of the arithmetic operations, without appropriate means for detecting overflows, intended initially only for the cheapest products, has spread over the years to all CPUs.

Even if from time to time there are news about some horror story caused by a combination of weak hardware with the lack of appropriate software checks, it appears that there is no hope that this unfortunate hardware design fashion will ever be reversed.

The processors I've used all have an overflow flag that will tell you if an addition result exceeded the size of the register. But I'm not aware of any compilers that will use the flag, because it adds overhead that isn't wanted or needed 99.99% of the time.
> Simply put, in many languages there is no addition operator which does mathematically correct addition and that is a sad state of affairs.

I read a book on Clojure when it was fairly new containing a spirited defense of the fact that arithmetic operators like + and - always returned the correct result. This was slower, because they needed to do bounds checking, but the result was always correct. If you wanted faster arithmetic with bugs, you'd use the explicit operators +. or -. (or *. or, presumably, /. -- I'm not sure how division was handled).

Shortly after that, Clojure reversed its policy and + will give you fast addition with bugs.

None, I think. Best case you have somewhat graceful handling on out of memory, but that handling isn't going to give you the result of the addition.
I don't think this is an unfair gotcha. If you have u32 for low and high, and someone gives you an array with 4 billion entries, you can return the correct value if you compute the midpoint correctly.
It's more comparable to:

  int add(int x, int y) { return (2*x + 2*y) / 2; }
If ints are 32 bit you would expect to be able to add 1 to 2^20, but that causes an overflow with this implementation. (low + high) / 2 is arguably worse because you turn the algorithm from working on any array into one that only works on arrays that are small enough not to cause the overflow.
Instead of

   (low + high) / 2
I use

   low + (high-low) / 2
to avoid any possible overflow.
I think that may actually be considered buggy, and the actual definition should be

   long add(int x, int y) { return x + y; }
with the caveat of course that `long` is larger than an `int`, which I can't recall if it is valid on all platforms. So mathematical u32 + u32 should be u64, but you can define an `add` function that has strict overflow behavior.
Shouldn't you write

    return (long)x + y;
to ensure the addition happens after the conversion to long?
Yes, this is essentially the case I'm making: if this implementation of binary search is buggy, then so is the addition operator in C (and many related languages).
They're not comparable. Trying to compare them suggests that you don't understand what the issue is with the `(low + high) / 2` case, or you don't want to acknowledge it.

Simple addition will fail if the result of the expression should fall outside the range of integers expressible for a given integer width—because it can't fit. That's not insightful. It's facile. The criticism of the naive midpoint computation is not facile.

The problem with using `(low + high) / 2` to compute the midpoint (instead of `(high - low) / 2 + low`) is that it will give invalid results even when the data type from the signature is wide enough that the expected result can fit.

No, it's just slightly unintuitive. That's not the same thing.
That might not be enough. On 64-bit linux, both int and long are 32 bits.
You meant on 64-bit Windows.

On most 64-bit UNIX-like systems, long is the same with long long.

Microsoft are the kings of backward compatibility. Since long has been 32 bits for the last 20 years, it shall remain so until the end of time. Linux compilers are free to be a little more pragmatic. The standards are flexible enough to allow either behavior.
In Microsoft C++ long and int are both the same size, 32 bits. Even when size_t is 64 bits.
Also, no integer overflow in Python, so no bug here.
Very true! Which is also why I used a C-like syntax in the snippet ;)