|
|
|
|
|
by fabian2k
1125 days ago
|
|
That was more detailed than I expected, a lot of posts on this topic tend to be more superficial. I suspect the BRIN index might need a bit of a stronger disclaimer, as far as I understand you really want to use that only for ordered data, and in those cases it is exceptionally good at its job. But it is a lot worse if that condition is not met. The post mentions this a bit, but with very soft language. I disagree a bit with "Don’t index columns if the table has little data", mostly because it doesn't matter in those cases. If the table is tiny the index is also very cheap (unless it's something really weird like a tiny table that is written at a very high frequency). And "little data" is just not specific enough for people to make decisions unless they already have a very good intuition on when the query planner would use such an index. A rather important part that isn't mentioned about multi-column indexes is which kinds of query can use them. That is probably not obvious if you never read about them in detail, but it's really important to know when defining them. |
|
BRIN is generally useful for datasets that have high local correlation. Ordered datasets have that, but it is not unique per se to ordered datasets. The summary type (operator class) you specify when creating the index is what defines which kind of correlation you need:
Minmax (the default for most indexable types in BRIN) needs values of the pages to be closer to eachother than to other ranges (a sorted table has this, but if you'd move the ranges around that would still hold). In the future, this may even be used to support topk-sorts.
Minmax-multi has similar needs as minmax, but has the ability to absorb some outliers in the ranges without losing precision immediately.
Bloom works well for equality checks and benefits most when only few values of the range of valid values are stored in each page range.
For instance, using the bloom operator class, you can use BRIN to exclude large ranges of the table at a fraction of the cost of scanning those ranges manually: it can quickly find out if the tuples in the table ranges might contain results with both date Y and user X, while only storing a fraction of O(sizeof table) instead of the O(num tuples) usually required for indexes.