Hacker News new | ask | show | jobs
by williamdclt 824 days ago
> I like Postgres stance that if query planner doesn't do a good job, then dba should first update table/column statistics, and if things are truly bad, submit but to Postgres so the query optimizer can be updated itself.

That’s a very unhelpful stance when I’m having an incident in production because PG decided to use a stupid query plan. Waiting months - years for a bugfix (which might not even be backported to my PG version) is not a solution.

I agree that hints are a very dangerous thing, but they’re invaluable as an emergency measure

3 comments

I like it as an emergency measure, but I often see them used when there's a shallow understanding of operating the db.

Before using a hint or rewriting a query to force a specific plan, I try and push the team to do these things:

1. Run `vacuum analyze` and tune the auto vacuum settings. This fixes issues surprisingly often. 2. Increase statistics on the table. 3. Tweak planner settings globally or just for the query. Stuff like `set local join_collapse_limit=1` can fix a bad plan. This is pretty similar to hinting, so not a huge argument that this is better beyond not requiring an extension.

All those methods are try and guess. With hints you can have a scientific approach to understand why the bad plan has been chosen and find the right plan. Then, you can address the root cause. join_collapse_limit=1 may set the join order but not the join direction, so that's not enough if cardinality is misestimated. And pg_hint_plan can set this parameter for one statement if that's what you want, better than setting for the transaction
In a database supporting several applications or even just a large data model it can be rather difficult to ensure a global setting to the query planner doesn't cause a regression to other queries. A query hint can be a nice way to quickly solve a performance issue without risk of regressing elsewhere. Agree they should be used as a last resort and as a sort term fix while a better, longer term fix is investigated but they are a critical tool to have in your tool box especially as postgres moves into the business critical domains occupied by the commercial database vendors.
`set local` allows tuning query planner settings scoped to a transaction, so you don't have to do it globally.
There are other ways to achieve that (optimizer features and decisions can be influenced by config settings, or by altering entries in statistics tables), but it's admittedly quite clunky. I agree that a production incident at 3am is definitely not the right time to start grokking these settings. The extension can be a good remedy for such cases.
I have been burned before when a query on a newer db version was "optimized" in a different way that caused performance to drop but I have not yet had a query on the same version vary so drastically as to cause a problem.

Is this a more common occurrence that I just have not encountered before?

It is fairly common in e.g. multi-terabyte Postgres databases. Beyond a certain scale, it can become impossible for the statistics collector to build an accurate model for large tables. When this happens, you can end up in the strange place where the query plans change every time you run analyze even though the data model and data distribution hasn't changed. I've seen operational issues at multiple companies due to the query plans randomly changing like this on large tables in Postgres.
Not common (if it was anything more than exceptional, PG wouldn’t be a production ready product) but it happens: https://www.postgresql.org/message-id/17540-7aa1855ad5ec18b4...