|
|
|
|
|
by koverstreet
5071 days ago
|
|
There's a tradeoff between latency and amount of memory touched. The beautiful thing about the heap layout is that you can prefetch more than one level ahead. If your keys are 4 bytes, 16 of them fit on a cacheline, and you can prefetch 4 levels ahead with a single cacheline. That means that if nothing is in cache, worst case each loop iteration will take < 20 ns. With any kind of clustering (really, it's just a variation on B-trees) you touch less memory but you can't prefetch as far ahead. In practice I suspect a 4-ary heap would perform better than a binary heap for my application, but there's a bunch of math I'd have to work out again that was hard enough for the binary tree version. Not quite sure how what you're describing would work, but I can't see offhand how you'd implement it without losing the prefetching the heap layout gives you. |
|