Hacker News new | ask | show | jobs
by ProblemFactory 3823 days ago
There is some discussion on why the Postgres team dislikes query hints here: https://wiki.postgresql.org/wiki/OptimizerHintsDiscussion

But perhaps I also have some practical advice to try.

I had a similar issue: I have a few tables with sensor data, 300-500 million rows, indexed among other things by event type. Some counting queries kept defaulting to full table scans. It turned out that this was because of limited statistics on the distribution of counts by event type.

The default_statistics_target config parameter sets how many entries Postgres keeps in the histogram of possible values per column, the default is 100 I think. Because my event types were not evenly distributed, the less frequent ones were missing from the statistics histogram altogether, and somehow this resulted in bad query plans.

As a fix, I upped the default_statistics_target to 1000, and to set it to 5000 for the biggest tables. Then after a vacuum analyze, the query planner started making sensible choices.

Another thing to try is perhaps reducing the random_page_cost config parameter from it's default of 4.0. On SSDs, random page costs are much closer to 1 than they are 4 (compared to long sequential reads).

1 comments

This is all good optimization advice, but I don't think it is applicable to my specific case.

My problem is not that PostgreSQL does not understand the distribution of my data. It does. The problem is that it comes up with a query plan without realizing that I'm only querying for a very small range of timestamps.

If this happens again, I'll have to try rewriting code to send it queries with hard-coded timestamps, cross fingers and pray. I find prayer quite essential with PostgreSQL sometimes because as ineffective as it is, at times I've got nothing else.

Have you talked about this in the dev mailing list? I'm sure they would help and likely consider adding something to improve the next version.
> My problem is not that PostgreSQL does not understand the distribution of my data. It does. The problem is that it comes up with a query plan without realizing that I'm only querying for a very small range of timestamps.

Which version did you reproduce that on? While the problem has not been generally addressed, the specifically bad case of looking up values at the "growing" end of a monotonically increasing data range has been improved a bit over the years (9.0 and then some incremental improvement in 9.0.4).

PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit

If it matters, it is an Amazon RDS instance.