Hacker News new | ask | show | jobs
by ninkendo 4011 days ago
So presumably you could do the following on your entire database:

    for each query in my codebase Q:
        for each possible combination of indexes I for Q:
            create hypothetical index I
            run EXPLAIN query Q
            if Q uses I, create I
If EXPLAIN doesn't actually run a query, and creating hypothetical indexes doesn't actually write anything, the above could presumably done in milliseconds, no?

Why not take this one step further and just enable an autoindexing mode in postgres using the query history as a heuristic?

2 comments

For the typical larger project, you'd wind up with hundreds of indexes each used by a single application query. The optimal index arrangement for a database often does not result in all columns being indexed on a query. It's normal and desirable (for index size and performance) to have fewer indexes that may only partially cover your queries. For example, a query with a WHERE of "a AND b AND c" is more often than not better handled with an index on (a) or (a, b) rather than (a, b, c). Especially if you have a second query with "a AND b AND d", you're most likely going to wind up with a single index for (a, b) rather than one for (a, b, c) and another for (a, b, d).

What could be useful would be a generated report for all your queries with possible suggestions, grouping together queries with the same partial prefix columns, and the cardinality that each index would result in. Then one could manually review to look for the most significant improvements possible. I think a fully automated attempt (even built-in heuristics that modified indexes over time) would sometimes result in a bad decision being made that could theoretically crash your application. Perhaps with a few years of perfecting a solution this could become the norm. :)

There's more to deciding whether to create any particular index than just whether it makes any of your queries faster. Each index can speed up SELECTs, but it also slows down any write queries on those tables. Each index also takes up space on disk and memory.

Then there's the question of how often which queries are run, and how important their performance is. Maybe query X is run by some background process, so nobody will notice if it takes 10x longer, but indexing the table to speed it up will slow down the INSERT statements on that table, which are in the user's path and will be noticed. Maybe it would also tax limited disk space on the DB server and require a hardware upgrade or a complex multi-server setup to work.

Coming up with an optimal indexing scheme for any particular database is a series of complex trade-offs which include taking customer needs and business requirements into account. I don't think it will be practical to do it automatically anytime soon.