Hacker News new | ask | show | jobs
by tluyben2 1537 days ago
Yeah, that works. My process is very different, but your advice is better as it hangs more on actionable tactics than learning intuition.

> Once you understand the underlying data structures, all the magic goes away. As it should.

Once you actually understand them, I feel you don’t need explain in most cases; you will simply ‘see’ why certain queries or definitions or structures are bad.

2 comments

> Once you actually understand them, I feel you don’t need explain in most cases; you will simply ‘see’ why certain queries or definitions or structures are bad.

Ummmm.... my experience pushes me strongly in the other direction. The same query's behaviour can be utterly different depending on the distribution of the data it runs on, and the parameters, and the currency of the statistics, and maybe memory and cpu pressure, and more. One of the heads of the team that wrote SQL server said "don't try to outguess the optimiser". That's very good advice that's served me well, otherwise it's exactly like optimising a program without profiling it; just don't (learnt that hte hard way).

All the internal details of how the server works that you learn are 95% useless until you need them, then they're essential, but just profiling is the best bang for the buck by a long way as a first step, and is always the first step (for me anyway).

> The same query's behaviour can be utterly different depending on the distribution of the data it runs on, and the parameters, and the currency of the statistics, and maybe memory and cpu pressure, and more

Yeah, but it is tough and time consuming to test all those parts. Ok, you can specifically try to benchmark by inserting skewed data, trying one and other parameter, but the extra time it takes from you... could be done for critical workloads probably. Maybe a better way would be just to force some particular index you see is suitable for that query, maybe use other hints that will hopefully prevent query optimizer biting down the road.

Luckily SQL Server features Intelligent Query Processing https://docs.microsoft.com/en-us/sql/relational-databases/pe... and Automatic Tuning https://docs.microsoft.com/en-us/sql/relational-databases/au... that tries to solve these pain points. Haven't had experience on how it performs in the wild, as I have to support older SQL Server installs.

> > The same query's behaviour can be utterly different depending on the distribution of the data it runs on, and the parameters, and the currency of the statistics, and maybe memory and cpu pressure, and more

> Yeah, but it is tough and time consuming to test all those parts.

GP was saying that because it is so tough and time consuming to test this, the best thing to do is just to look at the slowest queries and optimize them, instead of trying to identify problems by reasoning from first principles.

Thanks, that's indeed my point.

To clarify an earlier one where I said "The same query's behaviour can be utterly different depending on..." what I was saying was that the same query might be executed completely, totally differently depending on what the optimiser deems best. It might use a merge join with an index, or it might (depending on the data distribution) use a loop join and ignore the index(es) entirely. Same query, different execution.

This is why you should (almost) never use hints unless you know a lot more about the data than the DB server, which is rare (or unless the optimiser is being consistently stupid and very sub-optimal, that happens). If you add an index hint the DB will be forced to use it even if it reduces performance, and it can reduce performance a lot. If you force join orders explicitly you can catastrophically fuck performance - I think I've only ever done that once in production.

So profile, add indexes accordingly, but typically don't force them to be used; let the DB use them, or not, as it feels. Save hints for specially problematic situations.

Thanks for clarification :)

I see you deal with these issues too. In my case, SQL Server is providing storage for 3rd party application (Microsoft Dynamixs AX). Although it is customizable, but up to a point. I want to share some experience.

What we have is a huge covering index, consisting of 6+ key columns and many INCLUDE columns. The query executes millions of times per day, actually per hour (that implies query plan MUST be cached and query parametrized), the table itself is close to billion rows. SQL 2008 R2. And what SQL optimizer may sometimes think about - "hmm, that index takes up many bytes, let's use clustered index - not so effective on lookup, but anyways.". And then performance tanks, because, yeah, for that particular parameter value the reasoning was true, but not for 99% other cases. And in these cases, forcing particular index is a lifesaver. Along with a plan guides, which actually enable forcing that index on that query without touching the application.

So what I wanted to share? Sadly, even covering index in some cases may not be chosen by query engine.

OK...

If the covering ix is as almost fat as the table there may be little benefit in it. If the original table can just fit in the ram but the orig table + covering index together can't, they may be fighting for space in ram which means hitting disk which means slooow. But hard to diagnose from a distance.

> The query executes millions of times per day, actually per hour (that implies query plan MUST be cached and query parametrized) ...

Not at all! See below (Edit: I see what you're saying. Still, see below. Opt. 2 might be best here, but it depends)

> And then performance tanks, because, yeah, for that particular parameter value the reasoning was true, but not for 99% other cases

We had billion row searches and the cost of a recompile might be a second or two but the cost of a bad query (from previous, cached plan) can be vastly larger. Esp. here as you talk about it not suiting the other 99% of queries.

2 poss solutions:

1. Force a recompile every time. <https://docs.microsoft.com/en-us/sql/relational-databases/st...> (... WITH RECOMPILE clause). This really worked for us. Note that cpu is proportionately cheaper if you have multiple cores, and who doesn't these days. Recompiles are cheap on such machines. Recompile works very well IME!

2. More advanced. If you know the type of data the query then write identical queries in stored procs with different names, and consistently use a given proc for given expected parameter(s). Each proc compiles and stores its own query plan, so you have multiple query plans ready to go.

Further, make sure your stats are a) present and b) up-to-date. Bad stats = train-wreck query plans.

Further redux, don't assume that a covering index is all good news. I'd have to look at the query plan (edit: what I'm saying is multiple 'thinner' covering indexes may be better than one fat one).

1 & 2 may be complementary rather than totally exclusive but nevver used both together.

If you need any more, shout.

I mostly work the way you do, but still find explain helpful when my intuition fails. Most notably this usually happens when an index exists but the query manages to miss the index. Most often I experience this when it comes to dates and ORM-built queries that type cast.