Hacker News new | ask | show | jobs
by jzelinskie 2779 days ago
I tend to dislike cost-based optimizers because they add a layer of uncertainty when you're attempting to optimize database queries. I know what I want the optimizer to do. The problem is that when I run test queries against a local database or even a staging database, the statistics used to calculate costs differ. This means production can do something totally undesirable.
4 comments

Yes, that's a real problem that DBAs routinely grapple with. To address the problem, RDBMS vendors typically support hints and "plan management" features that allow DBAs to "pin" a particular plan once they've determined it's best. However, this usually requires a knowledgeable DBA; many developers would prefer that the database automatically chooses the plan, even if it changes from time to time. CockroachDB will definitely want to support both kinds of users.
I think is best if it was a per-session hint, like:

SET PLANNER TO COST

or similar, and default to a more "static" one.

Yeah, the problem is that basically none of this is portable, though. So if you're attempting to support both MySQL variants and Postgres variants, it gets nasty fast.
Not portable and you have to deploy separate from your application. A company I worked for solved this by using/setting hints for queries that were known to be mis-optimised in the common case.

If you're looking to be cross platform then hopefully you've already got a layer to translate intent to an appropriate SQL dialect, so you can drop it in there.

Hear, hear
Some databases let you just set the stats numbers to what you think they are and pin them, so changes to data won't change the stats and the optimizer will always use your stats and not stats calculated from the data. You'll get the "prod" execution plans on an empty database!
You can do that in Db2 z/os. In our shop we do it daily. In prod stats are up to date, we have job that would get all the stats from prod required for accesspaths and apply it on Dev empty tables modelling production. So Dev team when they bind packages which would use these stats to generate accesspaths 99.99% of the time same accesspaths would be generated in Dev and we know how the program would perform. Also for some tables we set pre defined stats, if that table a had real stats, oboy.
Sql server let you to import meta data from prod to dev. And you will get reliable execution plans.
> I know what I want the optimizer to do

For a cute little select statement, it might be true. However it's just not manageable most of the time.

Also Cost-Based can do some really dumb things if a table's contents change by quite a bit before stats are run again. I like being able to give hints so that you can switch between Cost or hierarchy based queries.