Hacker News new | ask | show | jobs
by hibikir 3822 days ago
This sounds like the classic Postgres problems on large, insert-only tables. The default settings for statistics gathering just aren't tuned for tables like this. Now, normally the problem is not full table scans, but using extremely inefficient join strategies, but chances are it's the same problem.

The typical solution is to modify the autovacuum settings for that table to recalculate the statistics a lot more often, and maybe even with much higher resolution, depending on your case.

You can also convince it that indices are the way to go by changing more basic settings about costs of reading a random page on disk vs reading sequentially, making full table scans more expensive, but tuning those settings away from realistic costs might have negative side effects for you.

I was able to have great success running complex queries on 100M+ row tables that were insert-only using this kind of trick, but YMMV. If nothing else fails, really experienced people are more than willing to help in the performance mailing list. They sure helped me quite a few times.

1 comments

The first thing that I tried was a full VACUUM ANALYZE and then re-ran the same query. It didn't help. Therefore modifying autovacuum won't help.

Adjusting internal costs is promising, but I'd like to avoid going there exactly because of the possible negative side effects that you mention.

You can adjust the costs on a per-session (or even per-transaction) basis. Depending on the nature of your query, it might be worth it.