Hacker News new | ask | show | jobs
by ajuc 531 days ago
The problem is that it's not that much harder to make it work for all the valid inputs.

Not doing that is not good enough.

Another example is summing lots of floats naively instead of using Kahan's algorithm.

It's like we had a theory of gravity that doesn't work on Mars because we have unnecessary division by (m-Mars' mass) in our laws :) It wouldn't be good physics.

4 comments

It's not much harder in this toy example. In real examples what this looks like is a ton of bikeshedding and arguing about minutiae instead of improving the system in ways that are orders of magnitude more valuable to everyone. The truth is it's far easier to waste time on this mostly meaningless crap, exchanging one exception for another, than it is to think deeply about the problem you're actually trying to solve. It's lazy.
In real world you're not implementing binary search but using one implemented already.

Hopefully implemented by someone who cared enough to avoid bugs or at least document the range of the arguments.

In the real world I don't have an array with over 2^30 elements 99.75% of the time. If I did need to process an array with over 2^30 elements, I'd have dozens of reasons to worry that standard techniques might start failing, and would therefore carefully consider and test everything to a much higher degree. I'd want a safety margin of at least 10x, which means I'm designing for 2^33+ and already know I need to avoid int32 everywhere I can. The type signature of a binary sort returning int32 in such a case should already be triggering alarm bells.

This is a bit like arguing that every building humans have built is flawed because it wouldn't withstand a significant meteor strike. That's not how we do it. We consider maximum likely use cases and then build in safety margins of at least 10^2, sometimes up to 10^6. If you think there's a possibility you're dealing with hundreds of millions of elements, you stop using int32 entirely. You don't wait for 100 million to turn into 4 billion and crash.

Is it worth making the algorithm slower just to have it work on extreme edge cases?
> Is it worth making the algorithm slower just to have it work on extreme edge cases?

Yes! If an API has no documented (and preferably enforced) limits on its arguments, the caller has every right to assume the full range of the argument types is supported.

Else it is very much a bug.

“Be reasonable, dude” is ok when you order 1,000 beers. It’s not an appropriate response when calling a binary search API.

Is it worth to make the algorithm faster at the cost of throwing surprise OutOfBounds exceptions in some extreme edge cases?

Maybe, but only if you - and only you,and not an attacker can control the case you are in.

If an attacker can somehow make sure that there is an array with 2³⁰ elements, you have worse problems than a binary search crashing.
Why do you think this algorithm only applies to arrays? Why do you think this algorithm doesn't apply to this sine lookup table that the compiler placed at the end of the memory in the microcontroller?
Because the article is about binary searching an array. Obviously algorithms must be adapted to what they are used for.
Sure, because my pre-computed table isn't an array. It's just a bunch of numbers in consecutive positions in memory. That has never been an array.

Also, what makes you think I'm not adapting the algorithm? I mean, I set the search indexes at the address of the beginning and end of the totally-not-an-array array so as to not have to do a relative memory read, because those are not supported in my microcontroller, or if they are supported, it's an extra cycle, and my timing budget is quite tight. Also, I don't do a div by 2, but a shift right by 1, same reason.

It's as simple as replacing one NULL character with something else. Or 1 length field.
If that happens, you’re going to get an out-of-bounds error no matter how you implement your binary search.
In Java. In C you just access random memory values (or segfault).
Well it's either that or document the domain.
Nice example with the 1/(m-m_mars)!
+1 for mentioning Kahan.