Hacker News new | ask | show | jobs
by arronax 789 days ago
Oracle DB is, or was, very close to that with its query profiles, baselines, and query patches. It wasn't automatic back in 2014 when I last worked on it, but all the tools were there. Heck, it was possible to completely rewrite a bad query on the fly and execute a re-written variant. I suppose it all stems from the fact that Oracle is regularly used under massive black boxes, including the EBS.

Also, the problem with automatic indexing is that it only gets you so far, and any index can, in theory, mess up another query that is perfectly fine. Optimizers aren't omniscient. In addition, there are other knobs in the database, which affect performance. I suppose, a wider approach than just looking at indexes would be more successful. Like Ottertune, for example.

2 comments

> Heck, it was possible to completely rewrite a bad query on the fly and execute a re-written variant.

Is there really such a thing as a bad query that can be rewritten to give the same results but faster? For me, that's already the query optimizer's job.

Of course there are "bad queries" where you query for things you don't need, join on the wrong columns, etc. And yeah the optimizer isn't perfect. But a query that you expect the query optimizer to "rewrite" and execute in an optimal way is a good query.

Highly database dependent, as the query optimizer can only perform the tricks programmed into it.

For instance, we use SQLAnywhere at work (migrating to MSSQL), and it wasn't smart about IN (sub-query) so EXISTS was much faster.

Or, as I mentioned in another comment here, MSSQL performs much worse using a single OR in WHERE clause vs splitting into two queries and using UNION ALL, something which has no significant difference in SQLAnywhere.

For MSSQL I've found that even a dozen sub-queries in the SELECT part can be much faster than a single CROSS APPLY for fetching per-row data from another table.

Also the query might rely on certain assumptions that will in practice always hold in that application, but not in general. Especially around NULL, for example NOT IN vs NOT EXISTS[1].

[1]: https://www.mssqltips.com/sqlservertip/6013/sql-server-in-vs...

One example is that in Presto, joins assume that left (IIRC) table is the big one you stream, and the right one is the small one you hash. One of the newer features was that in some cases, the optimizer is able to ignore the SQL order and correctly pick which table to hash and which one to stream.
Isn't that just basic table/index statistics to know which table is the smallest?
Not if you're joining on a filtered table or a result of another join.
> Of course there are "bad queries" where you query for things you don't need, join on the wrong columns, etc.

You can inject a hint into the query, forcing it to use a plan that would not otherwise be used, for example. Although, fixing a plan through a baseline is way cleaner. Mostly, I just meant that as an extreme example of something you can do, not something you should do. And yes, the only reason to re-write the query is when the query itself is bad in that it asks for unnecessary data or misses a join column. Admittedly, that's an extremely dirty and dangerous thing to do, as it uncouples app from db, but it is possible.

>Is there really such a thing as a bad query that can be rewritten to give the same results but faster? For me, that's already the query optimizer's job.

I can't tell if your disclaimer covers it but, yes, there are lots of bad queries that take a little bit of a re-write and run significantly faster. Generally it is someone taking a procedural vs set based approach or including things they don't need to try and help (adding an index to a temp table when it is only used once and going to be full scanned anyways). That's outside the general data typing/generally missing indexes.

The problem of new indexes messing up otherwise good queries is something I've battled on and off for the past decade with Postgres. Definitely annoying.
How would an index mess up another query? AFAIK indexes would only hurt write performance marginally per index, but most slow queries are read-only. I’ve tended to just add indexes as I go without thinking about it and haven’t run into issues, so genuinely curious.
While I don’t recall running into issues either, I can certainly see that a new index could cause the query planner to make a different decision. And that decision could - in some cases - end up being worse that the previous behaviour.

I definitely have seen the query planing make some peculiar choices in the past.

This is what I ran into. Often times they were indexes with a similar cost as another, and that caused issues.

I think the main index type that bit me are the ones created by exclusion constraints. Often times it looks to the planner like "the right" index to use, but there is another (btree) that is way cheaper...the exclusion constraint is just there to ensure consistency.

In those cases to fix things, I added a WHERE clause to the index (e.g. WHERE 1=1), and the planner wouldn't consider that index unless it saw that same 1=1 condition in the queries WHERE clause.

Indexes cost memory. It could push other indexes out of RAM. Or updates could increase IOPS, pushing them over an expensive threshold.