Hacker News new | ask | show | jobs
by pradocchia 4495 days ago
> Is it?

At least on MSSQL, I would expect a query plan like so:

  1. Index seek WHERE a = 123, yielding ~100 rows. [1]
  2. Bookmark lookup with results from (1), yielding ~100 rows.
  3. Filter (2) WHERE b = 42 and project date_column, yielding ~10 rows.
  4. Aggregate (3) by date_column, yielding ~10 rows or less.
And the optimizer will choose this over a full table scan so long as the 1+2+3 < full table scan. I don't know the threshold for that, but is is certainly more than 100 rows out 1M+, and the planner will have an estimate of selectivity that will inform plan selection.

[1] Important caveat. I interpret this line,

Current situation, selecting about hundred rows out of a million:

....to mean selection of 100 rows from the base table, rather than projection of 100 rows in the result, post-aggregation.

But if he really means a SELECT statement that returns 100 rows, then we have no idea how selective the WHERE clause is, and my answer changes to "The query will be much slower (impact >10%)".

1 comments

I took the latter interpretation. The "correct" solution uses the fact the first query can be solved by selecting 0 rows from the base table.