It’s like writing a compiler or interpreter: writing one is easy; writing a good one extremely hard.
This suggester isn’t very good. It takes a single query and suggests indexes for it. A good one would take a mix of queries and suggest a set of indexes, also considering the impact on write speed of additional indexes (table updates often need to update indexes, too)
For the example in this article, if the table is large and the average number of rows with a given ‘a’ value is close to 1 or if most queries are for ‘a’ values that aren’t in the database, it may even be better to do
CREATE INDEX x1a ON x1(a);
That gives you a smaller index, decreasing disk usage.
As one of the authors of the tool that was critiqued in depesz' blog post from last year, I fully agree that looking at this on a per-query basis is too simple.
Since we wrote our initial index suggestion tool for Postgres, we actually went back to the drawing board, examined the concerns brought up, and developed a new per-table Index Advisor for Postgres that we recently released [1].
The gist of it: Instead of looking at the "perfect" index for each query, its important to test out different "good enough" indexes that cover multiple queries. Additionally, as you note, the write overhead of indexes needs to be considered (both from a table writes / second approach, as well as disk space used at a given moment in time).
I think this is a fascinating field and there is lots more work to be done. I've also found the 2020 paper "Experimental Evaluation of Index Selection Algorithms" [2] pretty useful, that compares a few different approaches.
> This suggester isn’t very good. It takes a single query and suggests indexes for it.
The underlying API can analyse multiple queries - looks like they've only coded up the test `.expert` command for one.
From [1], "The sqlite3expert object is configured with one or more SQL statements by making one or more calls to sqlite3_expert_sql(). Each call may specify a single SQL statement, or multiple statements separated by semi-colons." then "sqlite3_expert_analyze() is called to run the analysis."
>It takes a single query and suggests indexes for it. A good one would take a mix of queries and suggest a set of indexes, also considering the impact on write speed of additional indexes (table updates often need to update indexes, too)
This is my pet peeve with SQL Server SSMS will give you a missing index suggestion and cost... the problem is inexperienced people will take the suggestion as is and create way too many highly specialized indexes over time.
This suggester isn’t very good. It takes a single query and suggests indexes for it. A good one would take a mix of queries and suggest a set of indexes, also considering the impact on write speed of additional indexes (table updates often need to update indexes, too)
For the example in this article, if the table is large and the average number of rows with a given ‘a’ value is close to 1 or if most queries are for ‘a’ values that aren’t in the database, it may even be better to do
That gives you a smaller index, decreasing disk usage.