Hacker News new | ask | show | jobs
by bigato 3478 days ago
Supposing the case in which all partitions are on the same disk and that you manage to index your data well enough according to your usage that postgres does not need to do full table scans, are there any additional performance benefits on partitioning?
4 comments

Well, anything that reduces the size of a search space helps performance.

Partitioning can drastically improve query times because the planner can use statistics only from a single partition (assuming the query works on a single partition). Postgres uses (among other things) a range histogram of cardinalities to determine the "selectivity" — how many rows a query is likely going to match. If you have a 1B-row table and you're looking for a value that only occurs once (low cardinality), the statistics won't help all that much. But if you partition it so that you're only looking at 1M rows instead of 1B, the planner can be a lot more precise.

Another point is cache efficiency. You want the cache to contain only "hot" data that's actively being queried for. If most of your queries are against new data, then without partitioning, any single page would likely contain tuples from all time periods, randomly intermixed, and so a cached page will contain a lot of data that's not used by any query. (Note that if you use table clustering, which requires the regularly running of the "CLUSTER" command, then you can achieve the same effect at the expense of having to rewrite the entire table.) If you partition by time, you'd ensure that the cache was being used more optimally.

Write access is also helped by partitioning by cold/hot data: B-tree management is cheaper and more cache-efficient if it doesn't need to reorganize cold data along with the hot. And smaller, frequently changed partitions can be vacuumed/analyzed more frequently, while unchanging partitions can be left alone.

1. Flexibility/freedom to distribute partitions in the future if needed.

2. Indexing doesn't work well in all cases. You can be better off scanning entire small partition tables lacking an index on a given column than with a single very large table whether that column has an index or not. (Indexes take up space and need to be read from disk if they don't fit in a memory cache, indexes don't work well for low-cardinality columns, etc.)

3. There are operations you can parallelize on a large number of small/medium tables and perform faster or more conveniently than a single very large table. One of my favorite techniques:

# usage: seq 0 255 |parallel -j16 ./alter-tablespace.sh {}

hexn=`printf "%02x" $1`

psql -Atc "ALTER TABLE tablename_${hexn} SET TABLESPACE new_tblspace" -hxxx -Uxxx xxx

4. A nice side effect of properly/evenly partitioned data you get for free is that you can do certain types of analysis on a single partition (or a few) very quickly and have it represent a sampled version of the data set. You can think of it as another index you get for free.

To add to responses that you already got there's also a nice use case that partitioning helps with.

When you have table that you constantly inserting large amount of data, and simliarly you are removing old data at the same frequency (i.e. only care about month of data).

If you set partition for example per day, it's way faster to drop old tables than performing a delete.

Yes. Less latch contention for nodes of a single btree index, for instance.
I didn't know what latch is, so I googled it and found a nice explanation:

https://oracle2amar.wordpress.com/2010/07/09/what-are-latche...

"A latch is a type of a lock that can be very quickly acquired and freed."

That brings me a couple more questions:

1. May I infer then that the only benefit from partitioning the table (fully located on the same disk) that can not be achieved by indexes is that queries will wait less time for this kind of lock to be released?

2. May I assume while a table is only being read and not changed, there's no performance gain from partitioning a table (fully located on the same disk) that can not be achieved by indexes?

There are other possibilities as well. For example, if your partitioning strategy is such that it improves the selectivity of an index, it could improve query plans for queries that were on an index that was less selective. As an example, I once had a table with over a billion rows distributed among ~100 tenants on which queries were typically run by tenant and date range. Partitioning that table by tenant dramatically improved the performance of those queries because those queries no longer had to scan through rows of which only ~1% were for the tenant of interest.
If you had built composite index instead, wouldn't it work just the same towards improving the performance? Well I can see that an composite index would occupy more space while partitioning wouldn't, and that should be something to take into consideration. But performance-wise, wouldn't it be the same?
No, for example, a composite index on (tenant, date) would be highly non-selective whereas an index on date in each individual tenant partition is highly selective and therefore higher-performance (in my case, much higher performance).
What if you made an index on (date, tenant) instead?
> 1. May I infer then that the only benefit from partitioning the table (fully located on the same disk) that can not be achieved by indexes is that queries will wait less time for this kind of lock to be released?

See below for many examples (I originally tried to split up the read and write benefits, but that usually isn't very meaningful since speeding up writes often speeds up reads indirectly (by enabling more indices to be maintained, releasing resources faster, leading to reduction in index size, or improved statistics).

> 2. May I assume while a table is only being read and not updated, there's no performance gain from partitioning a table (fully located on the same disk) that can not be achieved by indexes?

No. First, latch contention occurs even on fetch--latches are much lower level than the kinds of logical locks that get taken on updates and are generally required in order to enforce database invariants (in Postgres btree indices they live at the page level). Second, there are many other benefits.

An important one is separate maintenance of planner statistics. When a table has enough entries, Postgres's "most frequent values" list ends up not being able to fit all the rows with large frequencies (this can be tweaked up to storing 10,000 such buckets, but eventually even this is not enough if your table is very large). It then has to resort to its histogram, which normally only works with columns with a natural sort order. Since Postgres bases many important decisions during query planning on these statistics, you will often see order-of-magnitude improvements in query performance by improving table statistics. Note that composite indices do not keep separate statistics "per prefix" the way partitioning does (that is, an index on (a,b) does not track statistics of column b "within" column a); they just keep statistics on the individual covered columns or expressions. Not tracking cross-column correlations has a large impact in practice if your columns are not uniformly distributed throughout your table (and you have enough rows).

Another potential benefit (depending on your partition key) can be reduction in index size. For example, if you partition a table into a different segment for each tenant in your organization, you can use selective indices that don't include the tenant id. Note that you can already do this with partial indices, but Postgres does not use their statistics, which makes them much less useful, and large numbers of them can degrade planner performance (moreover, until recently, it didn't even use them for index-only scans, but I believe this was fixed in 9.6). Reduction in index size allows more indices (or more nodes of the index) to fit into memory (or even sometimes cache, for internal nodes), which can lead to dramatic speedups, especially if you can use an index-only scan. Smaller indices are also much faster to build, maintain, and scan.

The combined index size of indices in a partition is also slightly smaller (by a factor of around log(n) / log(n/p)) but that isn't really that significant (for 2^30 (over 1 billion) tuples, and 100 partitions, it is still just a 1.3x improvement, and can easily be overwhelmed by the size of the extra metadata for smaller numbers of tuples). For much the same reason, updating 100 100-entry histograms / most-frequent-value sets can be (slightly) faster than a single 10,000 entry one.

Another one is that if you are fetching more than one row at a time from a partition, keeping them (more) clustered on disk and in RAM (remember that fetching a page of RAM that's not in cache is actually quite slow!) means substantial savings in I/O and greatly increased likelihood of page buffer hits, which leads to order of magnitude savings. This is probably the biggest improvement from partitioning, and it applies to more complicated queries than point lookups (such as range queries and bitmap index scans). There are also some types of index (notably BRIN) that are most effective if your data distribution is such that you know you tend to have monotonically increasing keys over the entire table. In multitenant situations, these become much more useful if you can restrict them to a single partition, and allows you to again save index space.

There are other benefits to overall database performance (in the same vein as btree latching) from being able to address different tables in parallel, such as vacuuming being able to be performed separately on different tables (there are also some drawbacks, like operations that perform metadata scans [also VACUUM!] taking longer, but maybe this is improved in Postgres's new scheme--I'm not sure).

One that I don't see people talk about a lot, but is critically important for our usecases where I work, is that Postgres's implementation of SSI (serializable snapshot isolation), though it tries to use gap and row-level locking, will conservatively upgrade to page-level locks, and sometimes even table-level ones. If other updates are going on in the database at the same time, this can cause false conflicts, leading to many unnecessary aborts and retries. If data are partitioned and updates tend to be specialized to a partition, the chances of this happening are greatly minimized.

Finally, even though it contradicts your original premise: requiring an index to be used for every operation is asking for trouble. Making scans fast pays off in spades (especially if you have arbitrary custom filters in queries) and can free up your programmers to focus on things other than database performance :P A combination of intelligent partitioning (where appropriate), some understanding of Postgres's internals, and very little else can allow servicing surprisingly large numbers of users with varied query requirements, without creating many indices that aren't directly implied by the data model.

Hopefully, this gives you some sense of why people are excited for this feature :)