Hacker News new | ask | show | jobs
by kagebe 4583 days ago
In my opinion, the main take-away here is that the proofs were, obviously, no proofs. If you program with modulo arithmetic you have to do your proofs with modulo arithmetic. If you use IEEE floating point, say goodbye to your theorems about real arithmetic.

If you forget/omit a single fact about your target platform/machine/api (whatever axioms you found your reasoning on) in your proof, it may be worth nothing.

2 comments

Yup.

More annoyingly, for a binary search over an array (i.e. something that can fit in memory) their code is still wrong - they should be using size_t.

On top of that, this 'fix' from the article, even if it were corrected to use size_t instead of unsigned int...

  In C and C++ (where you don't have the >>> operator), you can do this:
  6:             mid = ((unsigned int)low + (unsigned int)high)) >> 1;
Has the same bug as the original snippet, just in unsigned space. Now, granted, binary searching a >2GB byte array in a 32-bit process is an unlikely use case... but it is technically feasible!
Yeah. That "fix" immediately jumped out at me in the sense of WTF??? All he did was double the max size of the array before his code fails again!

That so-called "fixed" code just couldn't do anything useful in a 32-bit address space. A billion 32-bit ints completely fills up the address space by itself. As you note, perhaps it could "technically" be possible to search a billion 8-bit bytes, but that's not what's being passed in to the function.

And if he's running in a 64-bit address space (otherwise how could he pass in an array of a billion ints), then he should be using 64-bit integer arithmetic. (I don't know enough about Java to know how feasible it is to do that).

If you have an array >2GB (I understand this isn't possible in Java), you can't attempt to use ints as indices at all, you need size_t everywhere.
That's what I immediately wondered - why it's int and not size_t? Well, I guess in Java engine there could be technical reasons why it is int, but in general case of implementation you'd probably assume it's size_t and then only one of the fixes works.
Yes, best fix:

   int mid = low + (high - low) / 2;
As a theory guy I'd argue you're looking at it backwards.

The proofs are fine in that the correctness logically follows given the appropriate assumptions. In this particular example, it's implicitly assumed that integer overflow isn't a concern. Ideally, something like that should be explicitly stated, but let's be honest, Algorithms and Math in general have to be shorthand heavy - the alternative is painful and often not human readable.

It's always been the responsibility of the implementer to dive deep enough into the theory to grasp all the omitted assumptions. A less gratuitous headline might read - "Common implementation pitfalls: Integer Overflow"

Well, then I restate: "Those proofs were not proofs of the theorems they/Bloch wanted to prove." The point being: Proving is not just about your proof, but also about writing meaningful Theorems, or as it is more commonly called when programming: specification.

The implementer is not free of proving her implementation correct, given that the original algorithm was proven correct on some theoretical computational model. Bloch argues that, since the original proof is not sufficient, the implementer needs to test. I'd add, that if the implementer would adapt/redevelop the proof for her implementation, she might forgo testing. Of course, this is practically impossible for many execution environments.