Hacker News new | ask | show | jobs
by dilatedmind 2060 days ago
tangentially related, but have you ever had to deal with statistics on a table indexed on (x, date), where there are a few million rows added each day, and x could be [0,1000], with a distribution like 80% of rows owing to only a few values of x.

ran into a situation like this where after enough days of data had accumulated, postgres would eventually fall behind on updating stats, such that a week could lapse without stats being updated, causing the query planner to think no rows existed within that time range. This would result in a nested loop instead of a more efficient hash join, causing a query to take 2 hours instead of 2 seconds.

increasing the number of rows sampled with set statistics didn't seem to help. wound up running a cronjob to inspect pg_stats, and manually running analyze when enough days had lapsed without most_common_vals being updated.