Hacker News new | ask | show | jobs
by davidrowley 861 days ago
> Multi-selectivities are good, but IIRC they can't be specified across tables and thus across joins, right?

Yeah, no extended statistics for join quals yet.

> how do you reconcile multiple selectivities?

Looking at https://doxygen.postgresql.org/extended__stats_8c.html#a3f10... it seems the aim is to find the stats that cover the largest number of clauses tiebreaking on the statistics with the least number of keys. For your example both of those are the same, so it seems that which stats are applied is down to the order the stats appear in the stats list. That list is ordered by OID, which does not seem ideal as a dump and restore could result in the stats getting a different OID. Seems sorting that list by statistics name might be better. That's what we do for triggers, which seems like a good idea as it gives the user some ability to control the trigger fire order.

1 comments

OK, so basically ad-hoc/greedy. That can get you into, well, issues :-) There are methods where you first adjust away impossible statistics (if you have quals a,b, then P(a)P(b) >= P(a AND b), but if your estimates come from multiple places this might not hold[1] -- so you need to fudge one or more of those before proceeding), and then combine them optimally using Newton-Raphson. It's pretty neat. But it does require a matrix library and stuff :-)

[1] Similarly, you can sometimes guarantee stuff like P(a AND b) <= 1/n due to a unique multi-column index, which can also inform some of your sub-estimates.