|
|
|
|
|
by koolba
3397 days ago
|
|
I think I saw a proposal recently for something that would cover this use case. IIRC it was for an index organized table that stores the entire contents in a btree (so it would naturally be stored in primary key order). I don't think there's been any work on it yet though. |
|
A B-tree can in no terms be described as being laid out on disc in primary key order. The individual pages of the tree are placed on disc randomly, as they are allocated. Therefore an index scan won't return the rows in index order as quickly as the current scheme of having the rows separate from the index and sorting them every now and again.
Ultimately, for the goal of fast in-order scan of a table while adding/removing rows, you need the rows to be laid out on disc in that order, so that a sequential scan of the disc can be performed with few seeks. This requires that inserted rows are actually inserted in the space they should be, which is not always possible - often there isn't space in the page, and you don't want to spend lots of time shifting the rest of the rows rightwards a little bit to make space. To a certain extent Postgres already does insert in the right place if there is space in the right disc page (from deleted rows), but because this is not always possible, the solution is to re-CLUSTER the table every now and again.
I think the Postgres way is actually very well thought out.