Hacker News new | ask | show | jobs
by DoubleFree 792 days ago
The postgres query optimizer will try to minimize the number of pages read from disk (and the number of intermediate pages written to disk). Benchmarking the query optimizer by making the shared buffers large enough to hold all the data therefore seems wrong, as you're then measuring the speed of the query optimizer and the join processor, instead of the quality of the generated query plans. It would not surprise me if the generated plans for these versions are actually all the same and this is only measuring execution speed.
2 comments

No, it optimizes cost, which includes pages read from disk _and_ things like CPU use. Cost is an arbitrary unit meant to correlate with time spent, not by disk loads, so it's completely reasonable to compare plans with everything loaded in RAM. (Cost is by convention scaled such that one page read from disk is 1.0, but that's a different things from “the optimizer will try to minimize the number of pages read from disk”. It could just as well have been scaled so that 1.0 was 1 ms on some arbitrary machine.)
It is certainly possible that the plans are similar, and that improvements to the execution engine are being measured. The join order benchmark was designed to test optimizer quality. It is worth noting that in addition to trying to measure the number of pages read from disk, the PG optimizer also tries to reduce the number of tuples examined by the CPU, the number of predicate evaluations, etc. All these numbers are rolled up into a "cost," which is the function that the optimizer minimizes.

It is also true that measuring cold cache and warm cache performance can produce different results, and this experiment is certainly in the warm cache scenario. But, the cold cache scenario suffers from the problem you mention as well: an improvement to PG's B-tree that saves a few IOs will dominate any kind of CPU-based improvement (at least at the data size of the join order benchmark).

FWIW, the plan for the query with the P90 latency changes from a plan that uses loop and merge join in PG8.4 to a plan that uses hash join in PG16 (where it is no longer the P90 query), which is at least some evidence of optimizer improvements.

> It is certainly possible that the plans are similar, and that improvements to the execution engine are being measured. The join order benchmark was designed to test optimizer quality.

I don't think that it's possible to test optimizer quality in isolation -- not really, not if it's to be in any way practical. Many (most?) individual improvements to the optimizer are hopelessly intertwined with executor enhancements. This is usually fairly obvious, because the same commit changes both the optimizer and the executor together. Sometimes it's much less obvious, though, because its more a case of the optimizer and executor coevolving.

It's probably still the case that a lot of the improvements seen here are pure executor enhancements, but I can't say that I'm very confident about that. (As the main person behind those B-Tree IO savings you might expect me to have more confidence than that, but I find it horribly difficult to tease these issues apart while keeping the discussion high level/relevant.)