Hacker News new | ask | show | jobs
by i_s 1524 days ago
At my work, we often parse and rewrite a query before handing it off to SQL Server, because there are a lot of cases where Microsoft misses obvious optimizations. Sometimes there are also optimizations we can do because of things we know at compile time, but don't fit in the type system of SQL. The impact varies all the way from just shaving off 10% of the execution time, to changing some queries from timing out in a web request to executing in a few hundred milliseconds.

A few examples:

- Inlining scalar function calls (less impactful now with Sql Server 2019)

- Removing joins from a query when we know it won't impact the number of records

- Deepening where conditions against derived tables

- Killing "branches" of union queries when they can be determined to not matter statically

Yes, we could write the queries that way in the first place, but it would make them harder to compose, more verbose, and harder for the programmer to communicate intent. Not to mention, often the user can impact what the query will be, making it less feasible.

2 comments

Hmm? This doesn't make much sense. MSSQL does these already I think (except the first)

.

- Inlining scalar function calls (less impactful now with Sql Server 2019)

write them as table valued funcs and it'll inline them for you (ugly but it works and is easy)

.

- Removing joins from a query when we know it won't impact the number of records

That's just tree pruning. It does that

.

- Deepening where conditions against derived tables

If I understand you that's predicate pushdown. MSSQL does it well.

.

- Killing "branches" of union queries when they can be determined to not matter statically

Not sure what you mean. Can you give an example?

> This doesn't make much sense. MSSQL does these already I think (except the first)

We tested all of these before taking the time to do the rewriting, and no, either they don't, or they way they did it isn't good enough. You can say I'm lying if you want, I'm not going to spend time arguing about it.

You could post some working examples and show them failing.
> less impactful now with Sql Server 2019

Could I ask what your process is for detecting whether a rewrite rule is still useful in subsequent versions of the DBMS? Do you read the release notes and test things out manually, or do you have an automated A/B test thing going on?

Additionally, have you ever had a rewrite rule change from being beneficial to being detrimental after upgrading versions? If so, how did you detect that?

Thanks!

Edit: Also, what considerations do you have for rewrite rule order? Do you find that it makes a significant difference in practice?

To check if rewrite rules are still helpful, yea - we just read release notes and test manually.

We've never had a rule change from being beneficial to detrimental. For most of them, I don't think that would be possible, because they just involve giving Sql less irrelevant things to chew on. For a few of them, like the manual scalar function inlining, I could see that being possible, so we will just need to keep checking.

For rewrite rule order, I guess we just do the ones that can enable other optimizations first. So far, that has been pretty simple to determine. For example, when we trim joins from inner queries, we first trim their selections (depending on what is actually used in the outer queries).

> We've never had a rule change from being beneficial to detrimental

Whatever you're doing is really confusing me (see my prev post). A multi-thousand line piece of the SQL I wrote was tested on mssql 2019 and there was a blatant perf. fuckup from it's previous home on mssql 2016 (or was it 2014). Poss. down to the new cardinality estimator, I dunno.

> For example, when we trim joins from inner queries, we first trim their selections (depending on what is actually used in the outer queries).

If you looked at the query plans you will see this happens automatically. And very reliably because it is easy (indeed, quite straightforward) to do automatically.

Also... 'we just read release notes' - these optimisations are not documented (except maybe in one place which they carefully undocumented after the 1st release) because these are trade secrets. The optimiser is one of the most important and carefully guarded parts of mssql - it's not in the notes and never will be.

No offence but... seriously, wat???

Thanks! Super interesting.