|
|
|
|
|
by leif
4991 days ago
|
|
The buffers have the same capacity at each level of the tree. It doesn't have to be that way, and you can find different strategies in the literature, but that's the way we do it. The degree of each internal node is way lower, because we want to use the space in internal nodes for buffers more than for keys. In a 16TB tree, we would probably have a tree of height 6 or 7. What we restrict is the size of a whole internal node, so that any one buffer in it can get much more full than the others if it needs to. We want to pick a node size that makes sense for your media. 4MB is a good compromise for spinning disks between their typical bandwidth and seek time, and it's large enough to give the garbage collector a rest on SSDs. |
|
Assume a node size that can hold 1M records in a buffer and 32 links (that's about 16 MiB more or less, more than what you suggest, but it simplifies the explanations).
Assume further that to operate on a node, you read it entirely in memory, modify it, sort it, then write it entirely on disk and that that takes about 1 second. This is pessimistic as there are surely more efficient ways to maintain sorted 1M nodes.
Now consider completely random insertions or 16-byte records and assume that the keys will spilt completely uniformly among the subtrees.
1. Every 1M insertions, you need to spill from level-1 (root buffer) to level-2, which generates about 32 IOs 2. Additionnally, every 32M insertions, you need to spill all level-2 buffers to level-3, which generates 3232 IOs n. every 32^(n-1) M.insertions, you need an additional 32^n IOs to spill from level-n to level-(n+1)
So, all in all, to insert 32^n M.records, you need a total of (n+1)32^(n+1) I/Os, which means 32(n+1) IO per 1M insertions.
For example, in a 16TiB data store, i.e. with n=8, you need 288 IOs per million insertions, or about 2400 random insertions/second at 1s/random IO.
In comparison, a B-Tree with a fanout 1024 and a block size 16KiB would generate about 4 random IOs per insertion. These IOs would be dominated by the seek time (~10ms), so you could get only 25 insertions per second.
On the other hand, a point query in this B-Tree would take 4 IOs instead of 8 random I/Os in the fractal tree. So, that's the tradeoff: much better insertion rate vs. slightly worse query time.
Note that a small fanout (about square root of the equivalent B-tree fanout) is important. If you take 1024 as fanout in the fractal tree, the tree has half the height, but the number of IOs per 1M insertions drops to 10245 instead of 32*9.
Have I got it right?