| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by leif 4988 days ago

The argument I like goes like this:

Flushing one buffer one level down costs O(1) I/Os. It moves B elements, so flushing one element one level down costs O(1/B) amortized I/Os. The tree has fanout O(1), so it has height O(log N). Therefore, each element must be flushed O(log N) times before it reaches its target leaf. So the cost to get one element down to its leaf is O((log N)/B) amortized I/Os.

Compare this with a B-tree, which costs O(log_B N) = O((log N)/(log B)) per insertion.

I'm pretty sure your math is right, though it attacks it from the other direction and I've only just skimmed it. It seems sound though.

The point query tradeoff is there in the theory, you're right, but it assumes a cold cache. In practice, most people have enough RAM for all their internal nodes, whether they're using a B-tree or a fractal tree, so either way it's typically 1 I/O per random point query in either data structure.

Be careful with asterisks here. :)

1 comments

alex137 4987 days ago

Good.

Now, what is the recommanded layout of the node buffers (the 4MiB buffers). I've read about Packed Memory Arrays, but don't quite get the complete idea. Can we really do better than read, merge, write ?

(A pity there's no preview button on this forum...)

link

leif 4987 days ago

Unsorted is fine, we're just counting I/Os, and it takes O(1) I/Os to read or write a buffer regardless of the layout.

link