Hacker News new | ask | show | jobs
by btilly 3823 days ago
Yay! I like the changes.

But they are doing absolutely nothing about my biggest beef with PostgreSQL. Which is that there is absolutely no way to lock in good query plans. It always reserves the right to switch plans on you, and sometimes gives much, much, much worse ones. No other database does this to me. Even MySQL's stupid optimizer can be reliably channeled into specific query plans with the right use of temporary tables and indexes.

This is a problem because improvements don't matter if the query plan is "good enough". But they will care if you screw up. PostgreSQL usually does well, but sometimes screws up spectacularly.

The example that I have been struggling the most often with in the last few months is a logging table that I create summaries from. Normally I only query minutes to hours, but I set it up as a series of SQL statements so I first put the range in a table, and then have happened BETWEEN range_start AND range_end. PostgreSQL really, Really, REALLY wants to decide that the index on the timestamp is a bad idea, and wants to instead do a full table scan. Every time it does, summarization goes from under a second to taking hours.

Hopefully the new BRIN indexes will be understood by the optimizer in a way that makes it happier to use the index. But I'm not optimistic. And if I lean on it harder, I'm sure from past experience that I'll find something else that breaks.

4 comments

There is some discussion on why the Postgres team dislikes query hints here: https://wiki.postgresql.org/wiki/OptimizerHintsDiscussion

But perhaps I also have some practical advice to try.

I had a similar issue: I have a few tables with sensor data, 300-500 million rows, indexed among other things by event type. Some counting queries kept defaulting to full table scans. It turned out that this was because of limited statistics on the distribution of counts by event type.

The default_statistics_target config parameter sets how many entries Postgres keeps in the histogram of possible values per column, the default is 100 I think. Because my event types were not evenly distributed, the less frequent ones were missing from the statistics histogram altogether, and somehow this resulted in bad query plans.

As a fix, I upped the default_statistics_target to 1000, and to set it to 5000 for the biggest tables. Then after a vacuum analyze, the query planner started making sensible choices.

Another thing to try is perhaps reducing the random_page_cost config parameter from it's default of 4.0. On SSDs, random page costs are much closer to 1 than they are 4 (compared to long sequential reads).

This is all good optimization advice, but I don't think it is applicable to my specific case.

My problem is not that PostgreSQL does not understand the distribution of my data. It does. The problem is that it comes up with a query plan without realizing that I'm only querying for a very small range of timestamps.

If this happens again, I'll have to try rewriting code to send it queries with hard-coded timestamps, cross fingers and pray. I find prayer quite essential with PostgreSQL sometimes because as ineffective as it is, at times I've got nothing else.

Have you talked about this in the dev mailing list? I'm sure they would help and likely consider adding something to improve the next version.
> My problem is not that PostgreSQL does not understand the distribution of my data. It does. The problem is that it comes up with a query plan without realizing that I'm only querying for a very small range of timestamps.

Which version did you reproduce that on? While the problem has not been generally addressed, the specifically bad case of looking up values at the "growing" end of a monotonically increasing data range has been improved a bit over the years (9.0 and then some incremental improvement in 9.0.4).

PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit

If it matters, it is an Amazon RDS instance.

I think this is a pretty important area to work on. Not in the direction of query hints, but rather have "approved" query plans, and an interface to see new query plans, and how much their costs differ.

That'd not only make production scenarios more reliable, but it'd also make it much easier to test tweaks to the cost model in practice.

The problem is that that's a rather significant project, and it's hard to get funding for that kind of work. The postgres companies aren't that big, and it's not an all that sexy feature marketing wise.

It may not look like a sexy marketing feature. But it is the #1 reason why I don't recommend PostgreSQL.

I also don't think that query hints are a good way to do it. And I don't mind if the way to do it is somewhat cumbersome. This is very much a case where 20% of the work can give 99% of the benefit.

For example what about the following approach?

1. Add an option to EXPLAIN that will cause PostgreSQL's optimizer to spit out multiple plans it considered, with costs, and with a description of the plan that PostgreSQL can easily parse and fit to a query.

2. Add a PLAN command that can be applied to a prepared statement and will set its plan. It is an error to submit a plan that does not match the query.

And now in the rare case where I don't like a query's plan I can:

    EXPLAIN PLANS=3 (query);
Pick my desired plan from the list (hopefully)

Then in my code I:

    PREPARE foo AS (query);
    PLAN foo (selected plan);
    EXECUTE foo;
And now if I notice that a query performs worse than I think it should, I can make it do what I want it to.
> 1. Add an option to EXPLAIN that will cause PostgreSQL's optimizer to spit out multiple plans it considered, with costs, and with a description of the plan that PostgreSQL can easily parse and fit to a query.

The biggest problem with that approach is that the way query planning works isn't that a 100 different plans are fully built, cost evaluated, and then compared. Instead it's more like a dynamic programming approach where you iteratively build pieces of a query plan from ground up, and then combine those pieces to build the layer one up. Given the space of possible query plans, especially with several relations and indexes on each relation involved, such an approach is required to actually ever finish planning.

> Add a PLAN command that can be applied to a prepared statement and will set its plan. It is an error to submit a plan that does not match the query.

It's not easily, if at all in the generic case!, possible to prove that a specific plan matches a query. You could obviously try to build every possible plan and match against each of those, but that's computationally infeasible (we're talking factorial number of plans, depending on relations here).

So I think such an approach has no chance of working.

What's more realistic is a running queries in a "training" mode. That training mode would, matching on the specific parsetree, store the resulting plans in a table. Before exiting training mode you'd mark all these plans approved (after looking for bad cases obviously). After that preparing a new query still does the original query planning, but by default the query stored in the "approved plans" table would be used. The cost differential and the new plan would then be associated with the currently approved plan. Regularly the DBA (or whoever fulfills that role), checks the potential plans and approves new ones.

Based on a configuration option queries without approved plans would error out, raise a log message, or just work.

Now even that has significant problems because e.g. DDL will have the tendency to "invalidate" all the approved plans. But that's manageable in comparison to being woken up Friday night.

I don't see how my objections are impossible.

On EXPLAIN, if you've passed my example PLANS=3 it would first do the plan in its usual way, and then try optimizing several more times, with some chance of randomly making suboptimal decisions at various decision points. It would keep doing this until it either had enough plans or else was making essentially random decisions and still couldn't find more.

I can see this requiring a significant refactor of existing code, but stochastically exploring "pretty good" plans doesn't require facing the whole tree of possible plans.

The DBA hopefully can recognize the desired plan if it turns up.

As for the PLAN command, I do not see the problem. I can look at the query and the EXPLAIN output, and I can figure out exactly how and where that query's conditions are being incorporated. You might need extra output from EXPLAIN to make it always possible to do in an automated way, but it should be possible.

Put another way, you propose that the database has a way to look at the query and a plan stored in the table and figure out how to execute that plan for that query. What would that table contain that couldn't be represented as a chunk of text supplied with a PLAN command?

> On EXPLAIN, if you've passed my example PLANS=3 it would first do the plan in its usual way, and then try optimizing several more times, with some chance of randomly making suboptimal decisions at various decision points. It would keep doing this until it either had enough plans or else was making essentially random decisions and still couldn't find more.

You'd not get anything useful by doing that. If you really consider this as a dynamic programming problem, at which place in the 'pyramid' of steps would you choose the worse plan? To quote the source:

        /*
	 * We employ a simple "dynamic programming" algorithm: we first find all
	 * ways to build joins of two jointree items, then all ways to build joins
	 * of three items (from two-item joins and single items), then four-item
	 * joins, and so on until we have considered all ways to join all the
	 * items into one rel.
	 *
If you'd just make some random 'bad' decisions, you'll not have a significant likelihood of finding actually useful good plans.

> As for the PLAN command, I do not see the problem. I can look at the query and the EXPLAIN output, and I can figure out exactly how and where that query's conditions are being incorporated. You might need extra output from EXPLAIN to make it always possible to do in an automated way, but it should be possible.

Good luck. Besides generating all possible plans postgres knows to generate, you very quickly essentially run into something equivalent to the halting problem.

> Put another way, you propose that the database has a way to look at the query and a plan stored in the table and figure out how to execute that plan for that query.

What I'm proposing is matching on the parse tree of the user supplied query. For each query there's exactly one parsetree the postgres parser will generate. We have a way (for the awesome pg_stat_statements module) of building a 'hash' of that parse tree, and thus can build a fairly efficient mapping of parsetrees to additional data. In contrast to that, for plans you can have a humongous number of plans for each user supplied query.

> What would that table contain that couldn't be represented as a chunk of text supplied with a PLAN command?

It would contain a, less ambiguous, version of the user supplied query. Now you could argue that you could add that to the PLAN command for matching purposes - but then we'd need guarantee that you could supply arbitrarily corrupt plan trees to postgres, without being able to cause harm. Something that'd cause significant slowdown during execution.

EDIT: Formatting

You just use the approach used in simulated annealing. You set a threshold for making random decisions, and adjust that probability up and down as you try to optimize. For example you might say that in each step there is a 10% chance of adjusting the COST factor randomly by a factor between 0.2 to 5. If you've got 3 tables, most of the time you will come to the current optimal decision. Most of the rest of the time you will make one suboptimal choice. Run that a few times and you'll should several plans that are suboptimal but not actually horrible. Keep running and playing with the randomness factors until you either have enough plans, or are basically making entirely random choices and can't get enough.

As to the algorithm described in the comment, it is actually able to search through all plans. The dynamic programming bit just means that you don't have to unnecessarily traverse all possible logic paths that a naive recursive algorithm might. But if there is a good plan, it can be discovered.

It is unclear from the comment whether the algorithm can only find plans which involve adding one table at a time. This would not be unreasonable - for example I know that Oracle circa a decade ago did that to avoid having to consider too many query plans on large joins. I have no idea whether that ever got changed.

As to the parse tree, that seems like overkill to me. You have a query. It has a list of tables, and a list of join conditions. A query plan includes tables, indexes, filters, and so on. It could easily be augmented by stating which query condition was applied where.

To validate it you have to see that the lists of tables match, the list of applied conditions match, and each plan step could actually have the effect of doing that condition. If it all ties out, it is a valid plan. The fact that the plan might be crap is something that is explicitly left in the hands of the user - that's the whole point of the feature. But at that point it is clearly valid.

I've always thought that there should be an option to allow the optimiser to try out different alternative plans during slack periods, in the background. It should then compare the performance of these alternatives to the original, then "change it's mind", if it finds a faster plan. It should use statistics to choose which plans to test, i.e. prioritise queries which are often called and which are slow.
This sounds like the classic Postgres problems on large, insert-only tables. The default settings for statistics gathering just aren't tuned for tables like this. Now, normally the problem is not full table scans, but using extremely inefficient join strategies, but chances are it's the same problem.

The typical solution is to modify the autovacuum settings for that table to recalculate the statistics a lot more often, and maybe even with much higher resolution, depending on your case.

You can also convince it that indices are the way to go by changing more basic settings about costs of reading a random page on disk vs reading sequentially, making full table scans more expensive, but tuning those settings away from realistic costs might have negative side effects for you.

I was able to have great success running complex queries on 100M+ row tables that were insert-only using this kind of trick, but YMMV. If nothing else fails, really experienced people are more than willing to help in the performance mailing list. They sure helped me quite a few times.

The first thing that I tried was a full VACUUM ANALYZE and then re-ran the same query. It didn't help. Therefore modifying autovacuum won't help.

Adjusting internal costs is promising, but I'd like to avoid going there exactly because of the possible negative side effects that you mention.

You can adjust the costs on a per-session (or even per-transaction) basis. Depending on the nature of your query, it might be worth it.
You can push it in favor of certain options by disabling the one you don't want, such as sequential scan.

SET enable_seqscan = OFF;

We use these options a lot on tables that result in odd query plans to get them doing the best option..

That's a random sledgehammer, but that is how I have been solving the problem. I've set enable_seqscan, enable_nestloop and enable_material to false and it is working at the moment. At first I only turned off enable_seqscan, but then I turned off the other two after we switched database hosts and the query went belly up.

What scares me is that this is unreliable, and according to the documentation, the optimizer is free to choose to ignore everything that I say whenever it wants. The fact that it already HAS done that to me does not provide me comfort.

Disabling sequential scans is generally reliable. It sets the planner cost of seq_scan excessively high, so the planner will only choose it if there's no other choice (at least, that's been my experience). Of course, if your query is complex enough to trigger GEQO that might not work out.