Hacker News new | ask | show | jobs
by gtrubetskoy 4687 days ago
I too "discovered" B-Trees (and B+Trees) recently and wrote about it: http://grisha.org/blog/2013/05/11/relational-database-on-top...

I even implemented a key/value backed SQL database (using Thredis, which is Redis and SQLite) http://grisha.org/blog/2013/05/29/sqlite-db-stored-in-a-redi...

SQLite3 has an awesome C implementation with copious comments if you really want to understand them, BTW.

The most fascinating thing about B-Trees is that the entire data structure is stored in equally sized blocks (pages), and getting a page by number is a very easy operation in block storage, which is what our RAM and hard drives are (even though back then they were tapes and punch cards).

Few people realize that it's not just DB's that use B-Trees, our filesystems are also largely B-Tree based.

It's remarkable how few today's "developers" understand how they work or even know what they are, yet they are so fundamental to everything computers do.

Without B-Trees our present world of computing wouldn't exist as we know it. It could be the coolest data structure ever :)

1 comments

I've read your first link and I don't get why you talk about "A relational database needs to store rows in order". AFAIK you have to add an "order by" clause to ensure ordering of any kind.

Relational algebra is probably not dependent of ordering for the common operations. Of course, the actual implementation of a relational database might need ordering of keys to be reasonably efficient. Is that what you meant?

The question is "what in order" means. The DB needs to store rows in some order that makes sense to it, but that might or might not match whatever order a user might want.

That's where ORDER BY comes in: presenting the data in some order that makes sense to the user. That might be completely different from what's stored in the DB. But the DB needs its own order, in order to be able to access things in a reasonably efficient manner.

To put it another way, the SQL standard doesn't guarantee any particular ordering. This does not mean that table storage must be unordered: it means that a database can store them in any order it wants, which might not be any order at all, or it might have some particular kind of order which the DB depends on to do its work.

But as a database user, you cannot count on that order matching what you, personally, would have implemented if you'd had a chance. ORDER BY clauses allow you to tell the database to sort the results of a query according to something appropriate for your particular needs, but this has no effect on how they are stored.

It needs to store rows in order to support indexes, so that a row could be found without having to scan the whole table, and that's exactly what B-Trees let you do, and also add or remove rows without having to rewrite the whole table, but only affected blocks.
What you say, isn't that more or less exactly what I wrote?