Hacker News new | ask | show | jobs
by magicalhippo 1296 days ago
A composite index can also be used for a partial index scan[1], so if you're frequently looking for just int100 as well as the int100,int1000 combination (as per the article's example), then a composite index int100,int1000 can be used for queries just filtering on int100.

The order of the columns in the composite index might matter then. We got some nice savings by reordering columns in indexes and changing the join order in queries (our DB wasn't that smart) or adding a "useless" filters to the where clause, allowing us to consolidate multiple indexes into one composite one.

[1]: might be using the wrong terminology here, I'm not talking about a partial index[2], but using only the first N dimensions of a M-dimensional index.

[2]: https://www.postgresql.org/docs/current/indexes-partial.html

1 comments

There are some DBs where the order of fields doesn't matter in the compound index if you're just searching by one field (but of course is less efficient). I think Oracle supports this.
All can search an index by just one field if it is the first field covered by the index, few can use an index if your predicate concerns the second and not the first.

IIRC Oracle does support this and calls it a skip-scan. If the first column in the index is not very selective this can be very efficient, though if the first column is good in terms of selectivity then a skip-scan will be very inefficient and the DB might as well just scan the whole index. Given that a good choice of index usually has a fairly selective column as its first, and that unless storage space is very cramped you are likely better off just defining an extra index without the column that you want to not touch in some queries, this means that a skip-scan isn't helpful as often as one might think which is one of the reasons DB engine maintainers give for not spending time implementing and maintaining the feature.

There are some use cases where having the option to skip-scan is definitely a significant bonus, but they are rare enough that I understand why most engine maintainers have not implemented them.

Yes, I'm talking about skip-scan. I know pretty much all DBs can use compound index predicates for querying :)

I can think of several cases where it would have been nice to have, sometimes "good enough" for a particular read pattern is better than additional write overhead etc.

One arguement against implementing nice to have features in a SQL or similar engine is that it adds complexity to the query planner: to try make it bright enough to know when to use it without using it when it is significantly detrimental, without ballooning there time needed to make a good enough¹ choice, without making a less good choice often enough that the overall efficiency of the choices the planner makes is lowered.

Of course an obvious counter is to implement it as an optional feature only used when explicitly requested by a query hint, with the counter counter being "is it worth the implementation effort given how rarely it will get used in that case?".

Oracle's developers were seated by the benefits, other database developers have not been.

There are ways to "emulate" the behaviour using CTEs to gain the expected benefit in at least some cases, but I've not experimented with this myself so can't speak for how well it works in practise, or if the extra legwork makes queries too convoluted to be easily maintained (or understood easily at all be future maintainers).

----

[1] noting that query planners do not strive to produce the best query plan, that is a Turing complete problem in many cases, instead trying to make a good enough choice in a reasonable amount of time

Yep :) although it's much more expensive to get a query plan wrong... increasing plan time, even just 1ms to 2ms, will also significantly hurt throughout!

I maintain a couple fairly large db deployments totalling around 100 machines, and it's always fun to see how query planner jitter affects things on a larger scale.

Postgres is capable of doing this with some index types, but generally more slowly than a B-tree index being asked about its leftmost columns:

https://www.postgresql.org/docs/current/indexes-multicolumn....

Also worth noting that for very complex workloads that need to support arbitrary subsets of equality matches over columns, a Bloom filter might work best:

https://www.postgresql.org/docs/current/bloom.html

This dependz on the index type, but PostgreSQL also supports queries based on arbitrary attributes of multi-attribute indexes: both GIN and BRIN can be used for queries on any of the indexed expressions.

Maybe eventually PG will also get index skip scans for btrees, which could allow for arbitrary columns searches in the btree; but we're not there yet by a large margin.

Nice, I did not know MySQL supported this. Someday I'll work with MySQL again, good to know.
Good point, I reworded it to be more general. Can't recall seeing our DB being that clever.
> (but of course is less efficient)

This means the order does matter

Yes...?

Sorry my comment is not semantically accurate enough for you.