| HN Mirror

Let us say that in our example table we have 100 000 records with severit_id < 7, 200 000 with severity_id = 7 and 3 records with severity_id = 8.

Statistics claim 100k id < 7, 200k id = 7 and 0 with id > 7. The last 3 updates could have happened right before our query, the statistics didn't update yet.

Let us assume that we blindly trust the statistics and they currently state that there are absolutely no values with severity_id > 7 and you have a query WHERE severity_id != 7 and a partial index on severity_id < 7.

If you trust the statistics and actually use the index the rows containing severity_id = 8 will never be returned by the query even if they exist. So by using the index you only scan 100 k rows and never touch the remaining ~200k. However this query can't be answered without scanning all ~300k records. This means, that on the same database you would get two different results for the exact same query if you decided to drop the index after the first run. The database can't fall back and change the plan during execution.

Perhaps I misunderstood you originally. I thought you suggested that the database should be able to know that it can still use the index because currently the statistics claim that there are no records that would make the result incorrect. You are of course correct, that the statistics are there to guide the planner choices and that is how they are used within PostgreSQL - however some plans will give different results if your assumption about data are wrong.