Hacker News new | ask | show | jobs
by justin66 1591 days ago
See also: "Nearly All Binary Searches and Mergesorts are Broken" by Joshua Bloch. The cluefulness or otherwise with which people often react to Bloch's excellent post is not something to ponder very closely if you want to retain any hope in the future of software engineering.

https://ai.googleblog.com/2006/06/extra-extra-read-all-about...

https://news.ycombinator.com/item?id=3530104

https://news.ycombinator.com/item?id=1130463

https://news.ycombinator.com/item?id=14906429

https://news.ycombinator.com/item?id=6799336

https://news.ycombinator.com/item?id=9857392

https://news.ycombinator.com/item?id=12147703

https://news.ycombinator.com/item?id=621557

https://news.ycombinator.com/item?id=7594625

https://news.ycombinator.com/item?id=9113001

https://news.ycombinator.com/item?id=16890739

If doomscrolling all that isn't enough to make you fear for mankind's future I'm pretty sure there's an Ulrich Drepper glibc bug report rejection related to this topic (or several) that you can google...

On topic: Raymond's post has some other great stuff. SWAR!

4 comments

I want to reply to one of the comments you linked to, which is this:

> I would argue that the bug is not in the algorithm -- the bug is in languages that don't detect integer overflow by default.

Concretely, this is true enough. But abstractly, not so much: the algorithm is actually "buggy" if you abstract the problem a little. Namely, finding a midpoint of two operands does not require that the operands be numbers, or even addable for that matter. The introduction of that requirement is therefore a bug (at least in my eyes). The easiest way to see this is to replace integers with pointers. Then adding two pointers isn't even a well-defined operation in the general case, let alone dividing them by two. Whereas subtracting them and moving half the distance is actually quite well-defined, and we can see it behaves better too.

I would probably go so far as to claim that this is not an isolated example of where thinking about problems more abstractly helps us come up with solutions that have non-obvious benefits.

Subtraction, division and addition is one of the common answers that is still wrong, unless you also want to do a comparison, first, and that is generally high cost.

Read https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63303 to see many problems around pointer differencing.

A comparison never costs more than an addition or a subtraction.

If you would use a conditional jump, that would have a high cost.

However the maximum or minimum should always be computed without conditional jumps and many CPUs have special instructions for max and min, which are not more expensive than additions or subtractions.

On CPUs without max & min instructions, computing max or min requires 2 instructions (compare + conditional copy).

2 instructions vs. 1 instruction increases the program size but not necessarily the execution time, if the instructions can be overlapped with others.

Due to the complex architecture of modern CPUs, it is impossible to determine the cost of a simple sequence of instructions in the general case.

For each particular CPU, a different but equivalent sequence of instructions can be the best and longer sequences of instructions may happen to be executed in less time, if they can be better overlapped on a certain CPU.

The obvious form of the code with a comparison still produces a conditional branch on latest gcc [1]. It's extremely doubtful that you'll find a version that uses any comparison that consistently performs as quickly as any version that uses a little bit-twiddling, no matter what modern CPU you're talking about.

Many of your statements are misleading in context. Implying that you can't know or deduce things about the cost of a simple sequence of instructions is very odd. All software and people that work on optimization do it all the time.

And remember that the context is a suggestion that pointer types and pointer-subtraction is the answer to a question about integers, so getting into detail about instruction sequences isn't really going to help, as the basic idea is flawed.

[1] https://gcc.godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(file...

The code generated by gcc in this case is just bad.

It is known that gcc fails to do "if conversion" in many cases when it should.

The correct code using the CMOV instruction and only a single computation of the expression could easily be written with inline assembly.

However, if inline assembly is used, there is a much simpler solution using the carry flag, which was presented at the end of the article that started this thread.

In reality, all the high level language solutions that have been listed in the article are very bad in comparison with the right solution using inline assembly.

The solution using inline assembly and the carry flag is actually applicable to any CPU, with the minor inconvenient that the mnemonics for the instructions could vary (which can be handled with defined macros), because all have carry flags and on all CPUs this solution will be the shortest and the fastest.

For the high-level solutions, which is the best of them can vary from CPU to CPU, which was my point.

And I'll reiterate the original point, the pointer-operations (with comparison) solution is not going to be the best in most any case; you admit that it's not the fastest for inline assembly, which you claim is the right solution. It's not going to vary from CPU to CPU - comparisons in most compilers will become branches and cause a stall and the best case (cmov) is still going to be slower than a shift-from-carry or the (a&b)+(a^b)/2 version.

We don't need to admit defeat in optimizing such a simple case. A comparison is unnecessary and will pessimize, compared to a little bit-manipulation.

> I was shocked to learn that the binary search program that Bentley proved correct and subsequently tested in Chapter 5 of Programming Pearls contains a bug.

I haven't seen the mentioned proof, but if said proof is not formal and mechanized and/or does not consider all possibilities, including overflow, then should we really consider it to be a proof of correctness? It might prove some desirable properties, but I don't think we should leave it at that. I certainly don't think we should claim that "It is not sufficient merely to prove a program correct; you have to test it too."

When it comes to software programs, I believe proofs can and should be exhaustive. That does not necessarily mean you need to exhaustively test every input, but it does mean you need to prove correctness for every input, including inputs that might result in overflow or undefined behavior. Otherwise, we should not consider it a proof of correctness.

It was a formal proof, but one based on a false assumption, namely that x+y for two ints x and y is an int and a well-defined operation. It's not.

The trust in the significance of a proof is limited by the trust in the statement to be proven. The statement to be proven was "If XYZ assumptions about the statements in the underlying language hold, then the result of that function is the result of a binary search". The proof is correct. Just that XYZ contained something incorrect.

There is no free lunch. Even if we prove all our software formally correct, we still need to "program" the specifications, with all that comes with it. They can contain bugs, need to be debugged, revisited, etc. The above is an excellent example of this.

The Java solution is simpler because they are finding the average of 2 positive signed integers. So if you add 2 31 bit positive signed integers the result will fit in 32 bits and then you can just do an unsigned right shift to get the average.
Quite “amazing” that googleblog layout breaks on iOS. It’s literally impossible to see half of the text without the reader mode.
Yeah iOS is becoming the new IE. No matter how much people complain about google's chrome domination they at least try to keep up with the standards. iOS browser does not and they even lock the devices to their browser so you can't even choose a browser with a different engine
this isn't an iOS issue. That blog post is broken on Firefox Android as well, and in Chrome dev tools. There is a display inline-block messing it up. The newer blog posts display fine.