Hacker News new | ask | show | jobs
by SigmundA 1903 days ago
>LIKE compares characters, and can be paired with wildcard operators like %, whereas the = >operator compares strings and numbers for exact matches. The = can take advantage of indexed columns.

Unless this specific to certain databases, LIKE can take advantage of indexes too, without wildcards LIKE should be nearly identical in performance to = both seeking the index.

>Using wildcards for searching can be expensive. Prefer adding wildcards to the end of strings. Prefixing a string with a wildcard can lead to a full table scan.

Which is contradictory to the first quote, it seems you recognize that a wildcard at the end can take advantage of an index. Full table scan is the same thing as not taking advantage of an index, hence LIKE can take advantage of normal indexes so long as there are characters before the first wildcard or has no wildcards.

1 comments

LIKE 'abc%' will use indexes but LIKE '%abc' will not.

At least for the latest versions of every database. If you go back to a version from 10+ years ago there's no guarantees.

Pedantically if your database supports index scans it can use the index on the column to scan for '%abc' rather than the whole table which can be much faster while not as a fast as a seek.

It can only do a seek if there are character before the wildcard: 'ab%c', 'abc%' and 'abc' getting progressively faster due to less index entries transversed.

> it can use the index on the column to scan for '%abc'

Using an index would just mean more overhead to fetch data later, so optimizers will prioritize a table scan in these cases since it would have less cost.

I think it would depend, wouldn't it? If the query can be answered directly from an index (there exists some index containing all of the columns required by the query) then an index scan would suffice and be faster by virtue of not having to scan all the data (the index would be smaller by not including all columns). I believe most modern DB query optimizers are capable of this.

If there isn't such an index, then it's a toss up: yes, going to the main table to fetch a row has a cost, but if there are only a few rows answered by the query, then it might be worth it. If there are many rows, that indirection will probably outweigh the benefit of the index scan & we'd be better off with a table scan. This would require an optimizer to estimate the number of rows the query would find. I don't know if modern DB query optimizers would do this or not. (And my naïve guess would be "they don't", specifically, that the statistics kept are not sufficiently detailed to answer any generalized LIKE expression.)

> I think it would depend

Not for LIKE clauses using suffix wildcards, unless you create an index specifically using such a condition (CREATE INDEX IX_blah ON table (column) WHERE column LIKE '%abc');

Depends on estimated selectivity and if the index covers the result as well.

If the criteria would fetch few rows out of many it can be faster to scan the index then retrieve the few matching results and even better if the index covers the results it never touches the table itself (index only scan).

> Depends on estimated selectivity

This can't be determined with LIKE suffix wildcards and that's not how any of the commonly-used index data structures work (b-tree, hash, gist, or bitmap). Index metadata will not help in eliminating leaf pages, and every row is going to need to be scanned.

Yes every row of the index needs to be scanned not every row of the table which is faster than scanning the table.

I am most familiar with MS SQL server and it will most certainly do an index scan for what it thinks is a highly selective predicate with "suffix wildcards" and it can return results faster than scanning the table.

If the index covers the result columns it will scan the index and never touch the table otherwise it will do a key lookup to the table.