|
|
|
|
|
by RMarcus
787 days ago
|
|
I'm curious to hear a bit more of your opinion. For example, I'm surprised that syscall latency is something near the top of your list. I think the usual wisdom in the DB community is that the cost models are mostly fine, but the cardinality estimation is really bad. In terms of the deferred / alternative planning, do you think adaptive query execution is a reasonable way to achieve this? It certainly allows for information early in the query execution to impact later plans. My worry with these approaches is that if you get the first couple of joins wrong (which is not uncommon), unless you have something like Yannakakis/SIPs, you still can't recover. I am obviously biased on the whole "ML for query optimization" thing. One thing I would note is that every "ML for planning" approach I've seen does, under the hood, use ML for cost discovery/estimation. These approaches are just trying to balance the data they collect (exploration) with the quality of the plans they produce (exploitation). Interestingly, if you use ML in a way that is completely removed from planning, you actually get worse query plans despite more accurate estimates: https://people.csail.mit.edu/tatbul/publications/flowloss_vl... (again, I've got a horse in this race, so my opinion should come with a side of salt :D) |
|
I don't want to be tuning this, it takes a lot of time to do it properly and I haven't retested this in a long time. I just set something that has worked in the past which is probably bad.
I completely agree that Postgres should be able to figure this out on its own. This is just an example, there are more such parameters which should be adjusted to the hardware but most people will leave the defaults.