| In general I love what RDBMS and postgresql in particular can bring to you, but this is one corner of them that I hate: Planners are too smart for their own good. This is a standard story: A query ran for a long time without issue, and then, one day, some minor shift in your statistics happens, and now you have a major performance issue in your hands, without any real change on prod. No non-productive environment can help you: They don't have the same data or they do have the same data but the statistic sampling was slightly different. If it happens, you have to find an incantation that makes the planner comprehend what's going wrong. Postgresql has things like the statistic object and in this case the statistic column property, but finding the right incantation can be black magic, and adding indexes/statistics/... can take hours so trial and error is slow. Dumber databases have an edge here: Their performance is probably lower, but it is predictable. Some RDBMS have mitigations, e.g. Oracle's optimizer plan stability allows you to make the plan unchangeable. It's a 2-sided knife of course: It won't get better if the data has a chance for it, but it won't get worse either. |
> No non-productive environment can help you: They don't have the same data or they do have the same data but the statistic sampling was slightly different.
GoCardless still has a massive Postgres database (10TB or there-abouts) and only managed to scale it by investing heavily in tooling that helps developers work with it safely.
One example is https://github.com/gocardless/draupnir, a tool to create instances of production datasets very quickly (just `eval $(draupnir new); psql` and you have a mini production in ~3s) so you could try things like adding indexes, tweaking the plan settings (`set enable_seq_scan='off'`) and reshaping the data to see how your planner behaved.
I think it's very doable, though the planner still has blindspots. I had a side project to add linear correlation statistics to the planner that I abandoned when I stopped working with big Postgres databases, but that's an example of statistics that Postgres just doesn't track but lead to these pathological edge cases.
I'd rather have the clever planner than not, though. I've a healthy appreciation for the heavy lifting Postgres can do for you.