Hacker News new | ask | show | jobs
by twotwotwo 2115 days ago
Like folks mentioned there, I wonder if a higher-fanout heap (people asked about 4-ary) might also do well in practice. Looking at Wikipedia (https://en.wikipedia.org/wiki/D-ary_heap), it looks like maybe so -- the growth in number of comparisons isn't that bad, and it mentions 4-ary heaps working well specifically.

(Like how linear search wins for small N, or insertion sort helps with small subarrays in introsort: more steps, but each step is also a lot cheaper on modern hardware due to fewer branch mispredictions, better cache locality, or something else that better fits the hardware's strengths.)

5 comments

I've tried D-ary heaps before (even min-max d-ary heap). Somehow my memory was that D = 3 or 4 sometimes performed better than 2, but now that I checked, in my code (which I spent a lot of times optimizing) I settled on plain binary heap at the end. So maybe my memory was faulty? Though I could swear I saw a performance improvement for D > 2 at some point. Sadly I don't recall why I reverted to binary heap exactly, or even whether it was related to speed or something else. Anybody tried it and remembered how it turned out?
Another technique to speed up priority queues in a system with a memory hierarchy (i.e. modern computers) is the https://en.wikipedia.org/wiki/B-heap . I wonder if the author of the article is aware of this and is able to benchmark this.
I don't have a formal background in CS. Any insight into why "d-" is used as the prefix?
It probably stands for "degree". In a tree, the degree of a node is how many children it has.

(And before anybody gets pedantic: technically, this is the node's "outdegree" when the tree is represented as a directed graph. If it was an undirected graph, we would have to count the parent node as well.)

There are only two important things taught in CS that most people don't seem to pick up on the job. The most important is order notation. The second is the relation between grammars and state machines. Both are worth as much attention as you can afford for a week.

Things not taught in CS that you need to know on the job are legion. By far the most important of these is the use of invariants. Second might be the memory hierarchy, and cache semantics. Third might be use of bitmaps and bitwise operations.

I'm learning about invariants in a relatively introductory cs course and I've been told they end up being relatively useless in actual practice. When have you found them useful?
Invariants are essential when designing a system, to ensure the resulting system can be understood by actual humans who will be responsible to maintain it.

They are right that invariants will not be essential for your assignments, at least for the first couple of years, although they would save you from some dead ends.

You will be able to code some of your invariants as assertions, but as often not. The ones you can't are the ones you have to pay the most attention to.

If the system invariants get too complicated, or hard to express, the design is wrong--reliably. That is when they are most valuable. Millions of systems designed without invariants could have been right, but we're stuck with what we got, instead.

It's the number of children each node can have
Maybe they're asking why the letter "d". That is, why "d-ary" instead of "n-ary" or "k-ary".
Yes, that was my intent. Why the choice of the letter "d".
The letter “n” is already used for the heap size, and “k” is typically a number <= n.
I remember doing tests years ago and find that the extra comparison and loop control can tank the performance for n-ary when n>2. But that was >10 years ago.
For very large heaps, I have had speedups of about 3x due to better cache locality with B-heaps.