Hacker News new | ask | show | jobs
by timdorr 4011 days ago
Except PG index usage in the query plan is dependant on stats about the table (row count, width, etc). So, you would need to continually run it against production data and have it forever chase the perfect set of indices.

If you did end up making it, it should be named moby_dick.py.

1 comments

Or you tune for the best average execution time. Sample your queries and run them all through each query plan. One has to be optimal, unless all changes are negligible.
Even for a single query, the optimal query plan changes over time with the distribution of data in the table (and in other tables), as well as things like database load. There literally is not one best query plan, even with perfect information. Also, keep in mind that an abundance of even fake indices like this can have adverse effects on planner results by forcing it to consider more (and therefore, more suboptimal) cases with its limited budget.
And that's why people use NOSQL. It's much more predictable in production use.
Er, no. It isn't. NoSQL solutions have exactly the same problems. Most just don't have query planners or statistics capable of understanding how to properly deal with the ever-changing data (and also frequently don't even have the indexing capabilities required to do much more than a single get or set efficiently, anyway--these problems tend to come up with larger queries).
...Except the number of query plans is O(n!), where n is the number of joins.

MongoDB's query optimizer actually works like this, which is only tractable because it doesn't support joins.

I was about to write "Ummm...column order doesn't matter, so it's O(2^n)", but then I did some thinking and a bit of googling, and I guess it does. Yeah, O(n!).

Although, in practice, most tables would only have a relatively small number of columns upon which you'd want to consider building an index (usually on columns (often IDs) that you'd use to join to other tables), so that'd cut it down to O(m!) where m < n, and for larger tables, m << n.

<thinking out loud> Hmmmm, I wonder if you could use frequency information--or even the proportion of distinct values--of columns to guess a bit more intelligently... </thinking out loud>

Perhaps one could even use histograms to estimate selectivity of the query's join predicates to pick a good ordering of joins?

This is actually pretty much how a basic relational query optimizer works in practice.