Hacker News new | ask | show | jobs
by mioelnir 3478 days ago
First, if your query used the partition key in its where clause, the database knows(can calculate) which partitions can have a result and which can not. This means smaller indexes/less data to scan to find the result.

In the MSSQL case - not sure about others, this is were I had to use it - you can also switch data segments between tables indexed over the same partition function and with the same DDL. So you recreate the existing table a second time, create all the required indexes on it (which is fast because the table is empty), and then you switch partitions between them basically via pointer manipulation. The empty partition is now in the normal table, the data partition in the recreated one. Then you drop table on the recreated table. This is much more IO efficient than a delete-from statement.

This switching of course allows for a lot of other fun stuff as well, where you switch out a partition with a couple million rows, then work on it in isolation, switch the partitions back and then only have to "replay" the few rows that hit that partition while they were switched. Which is easy because they are now in the shadow table which is not updated further.

It is of course data and application dependent if you can use these things without affecting your application; but if it is suitable, the gains can be immense.

1 comments

So this won't help when you have something like a deleted flag and need to join on data that could be in either partition?
Yes and no. It won't help in the sense that the query planner can not identify the affected partitions. But it can help if the database supports parallel index scans, since instead of one big index, it has to scan n smaller indexes which can be done concurrently.