|
|
|
|
|
by keenerd
5074 days ago
|
|
> Optimal is a binary search tree in an array, like the way heaps are implemented. Not quite. The heap layout has bad spatial locality. Think about the last row of leaves - it occupies the last 50% of the array. The next-to-last-row (all their parents) is in the middle 25%-50%. All those leaves are far away from their parents, and thus the heap is not optimal. Better is to cluster parents and children into small groups. With clusters of three, two thirds of the time a child will adjacent to its parent. |
|
The beautiful thing about the heap layout is that you can prefetch more than one level ahead. If your keys are 4 bytes, 16 of them fit on a cacheline, and you can prefetch 4 levels ahead with a single cacheline.
That means that if nothing is in cache, worst case each loop iteration will take < 20 ns.
With any kind of clustering (really, it's just a variation on B-trees) you touch less memory but you can't prefetch as far ahead.
In practice I suspect a 4-ary heap would perform better than a binary heap for my application, but there's a bunch of math I'd have to work out again that was hard enough for the binary tree version.
Not quite sure how what you're describing would work, but I can't see offhand how you'd implement it without losing the prefetching the heap layout gives you.