Hacker News new | ask | show | jobs
by thargor90 1030 days ago
Yes, this explains the how, but not the why.

Why is there no optimization in place for this? Converting a=x or a=y or a=z to a in(x,y,z) should be trivial and the db should have heuristics to calculate the expected query cost to decide when to apply this transformation.

4 comments

Yes, and Postgres remains staunchly opposed to planner hints because the planner knows better! It always computes the optimal query plan!

The "optimal query plan" changes at the drop of a hat, as you can see in this case. Absolutely trivial syntax changes result in a completely different query, sometimes turning a sequential scan into an index-only scan or vice versa. So, 100x difference in query time, it doesn't just do that for small tables.

To add to this, did you know changing the order of your joins can impact the query plan chosen?
At the scale of three comparisons, performance is equivalent enough that there's no gain.

At the scale of 5,000 comparisons, the language's structure itself heavily disincentivizes writing it as an OR query, unless you're using a code generator that doesn't care about language structure.

My wild guess is that it's a rare enough corner case that it wasn't worth burning the resources on yet.

My gut says it's a resource issue. Oracle has the time and money to optimize a gajillion scenarios that increase performance without making developers think things through. Which makes the product easier to use and seem faster. It's very simple to code this one, but then there are 10,000 other cases you need to code too to make it cover a large percentage of potential optimizations.
Indeed, Postgres may be wildly popular these days thanks to being $free, but it's far from being the most intelligent relational database engine on offer.
It's not that easy to analyze a binary logic predicate to extract a set of possible values (especially given the three-value logic that SQL uses, where `a=x or a≠x` is not true). It's easy in the easy cases of course, but doing the analysis for a more complex expression quickly gets ugly. And I'm not sure it's easy to tell ahead of time if an expression will be easy or hard to analyze. They could hard-code a few simple patterns, but at that point it may get very unpredictable whether the optimization will be applied or not, defeating the purpose.
Every optimizing compiler does this kind of analysis and then some.

It's already completely unpredictable which kind of trivial optimization Postgres fails to apply, it could hardly get any worse.