| There is some discussion on why the Postgres team dislikes query hints here: https://wiki.postgresql.org/wiki/OptimizerHintsDiscussion But perhaps I also have some practical advice to try. I had a similar issue: I have a few tables with sensor data, 300-500 million rows, indexed among other things by event type. Some counting queries kept defaulting to full table scans. It turned out that this was because of limited statistics on the distribution of counts by event type. The default_statistics_target config parameter sets how many entries Postgres keeps in the histogram of possible values per column, the default is 100 I think. Because my event types were not evenly distributed, the less frequent ones were missing from the statistics histogram altogether, and somehow this resulted in bad query plans. As a fix, I upped the default_statistics_target to 1000, and to set it to 5000 for the biggest tables. Then after a vacuum analyze, the query planner started making sensible choices. Another thing to try is perhaps reducing the random_page_cost config parameter from it's default of 4.0. On SSDs, random page costs are much closer to 1 than they are 4 (compared to long sequential reads). |
My problem is not that PostgreSQL does not understand the distribution of my data. It does. The problem is that it comes up with a query plan without realizing that I'm only querying for a very small range of timestamps.
If this happens again, I'll have to try rewriting code to send it queries with hard-coded timestamps, cross fingers and pray. I find prayer quite essential with PostgreSQL sometimes because as ineffective as it is, at times I've got nothing else.