Hacker News new | ask | show | jobs
by renhanxue 263 days ago
The core issue the article is pointing to is that most database indexes are B-trees, so if you have a predicate on the form (col_a = 'foo' OR col_b = 'foo'), then it is impossible to use a single B-tree lookup to find all rows that match the predicate. You'd have to do two lookups and then merge the sets. Some query optimizers can do that, or at least things that are similar in spirit (e.g. Postgres bitmap index scan), but it's much more expensive than a regular index lookup.
1 comments

> but it's much more expensive than a regular index lookup.

It doesn't have to be, it just "is" in some database engines for various historical reasons.

I.e.: PostgreSQL 18 is the first version to support "B-tree Skip Scan" operators: https://neon.com/postgresql/postgresql-18/skip-scan-btree

Other database engines are capable of this kind of thing to various degrees.

The linked article’s optimization applies to compound index queries, not “OR” condition optimization.

Unrelated or not, skip-scan will be useful in some cases. However, the cases where it adds noticeable benefit are the cases where a separate index should have been used for leading columns anyway (and in memory-constrained/frequently-cold-cache situations, a separate index might even be faster). If you can’t even begin to guess at the order of magnitude of cardinality, or if your leading-column-lacking queries are quite rare and not that perf-sensitive on a big table exerting index cache pressure, then skip scans make sense.

Skip-scan or similar code can solve the problem if you have a compound index on both columns.

The query planner can include the entire matching range for the 'A' column and check the 'B' column for matches via skip-scan.

This is possible in principle, but I don't believe most (any?) database engines use this specific approach.

It could be optimal if the results are required in A,B sorted order.

B-tree skip scan does not address the problem in the article at all.