|
|
|
|
|
by RMarcus
798 days ago
|
|
It is certainly possible that the plans are similar, and that improvements to the execution engine are being measured. The join order benchmark was designed to test optimizer quality. It is worth noting that in addition to trying to measure the number of pages read from disk, the PG optimizer also tries to reduce the number of tuples examined by the CPU, the number of predicate evaluations, etc. All these numbers are rolled up into a "cost," which is the function that the optimizer minimizes. It is also true that measuring cold cache and warm cache performance can produce different results, and this experiment is certainly in the warm cache scenario. But, the cold cache scenario suffers from the problem you mention as well: an improvement to PG's B-tree that saves a few IOs will dominate any kind of CPU-based improvement (at least at the data size of the join order benchmark). FWIW, the plan for the query with the P90 latency changes from a plan that uses loop and merge join in PG8.4 to a plan that uses hash join in PG16 (where it is no longer the P90 query), which is at least some evidence of optimizer improvements. |
|
I don't think that it's possible to test optimizer quality in isolation -- not really, not if it's to be in any way practical. Many (most?) individual improvements to the optimizer are hopelessly intertwined with executor enhancements. This is usually fairly obvious, because the same commit changes both the optimizer and the executor together. Sometimes it's much less obvious, though, because its more a case of the optimizer and executor coevolving.
It's probably still the case that a lot of the improvements seen here are pure executor enhancements, but I can't say that I'm very confident about that. (As the main person behind those B-Tree IO savings you might expect me to have more confidence than that, but I find it horribly difficult to tease these issues apart while keeping the discussion high level/relevant.)