Hacker News new | ask | show | jobs
by zogomoox 899 days ago
Instead of cloning the branches all the way up to root, wouldn't it be more efficient to just write delta-records, so a structure per inner node that says "it's like this original node, but with the following modifications x,y,z"
2 comments

Efficient in what way?

Yes, a delta record would be smaller. But now you have to fetch both the new and the old, plus do computation to figure out what you have. This is trading off space and time.

I'm going to go to a reasonably low level to explain this.

Databases have generally found that the right tradeoff between space and time is to always work in terms of pages of fixed size. Now all reads are memory aligned, of known size. All writes are as well. And inside the CPU, processing a page in the most straightforward and predictable way possible is very fast. In particular you want to avoid complex logic that introduces too many pipeline stalls.

If you're going to work with pages anyways, you want to find ways to fetch as few pages as possible. Only have this page point to that page where you really need to. And put as much as reasonable on each page. The name of the game is to fetch as few pages as you need, and get everything you need from a page when you fetch it. Because when you're getting a new page, often you have to wait for a disk read. That's slow. It used to be really slow, you needed to wait for the right part of the disk to rotate around. Those disks went at something like 7200 rpm, but that means 120 revolutions per second, which means you're waiting for anywhere from 0 to 8.333... milliseconds for the read. Now consider reading through a million record database...

That's where a BTree comes in. It is a tree structure built around a page layout. When a page gets too full, it is split and one record is promoted to the level above. When the top page gets too full, a new level is created at top. That keeps it perfectly balanced at all times. So you can get to any record in very few reads. And if you're walking the whole data structure, a single page generally has many records.

You may want to rebalance a tree, or you may want to have multiple readers while a writer exists without relying on mutexes or locking.
as long as you don't modify the original nodes that would still work, no locks needed.