| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by arronax 836 days ago
	Oracle DB is, or was, very close to that with its query profiles, baselines, and query patches. It wasn't automatic back in 2014 when I last worked on it, but all the tools were there. Heck, it was possible to completely rewrite a bad query on the fly and execute a re-written variant. I suppose it all stems from the fact that Oracle is regularly used under massive black boxes, including the EBS. Also, the problem with automatic indexing is that it only gets you so far, and any index can, in theory, mess up another query that is perfectly fine. Optimizers aren't omniscient. In addition, there are other knobs in the database, which affect performance. I suppose, a wider approach than just looking at indexes would be more successful. Like Ottertune, for example.

2 comments

dmurray 835 days ago

> Heck, it was possible to completely rewrite a bad query on the fly and execute a re-written variant.

Is there really such a thing as a bad query that can be rewritten to give the same results but faster? For me, that's already the query optimizer's job.

Of course there are "bad queries" where you query for things you don't need, join on the wrong columns, etc. And yeah the optimizer isn't perfect. But a query that you expect the query optimizer to "rewrite" and execute in an optimal way is a good query.

link

magicalhippo 835 days ago

Highly database dependent, as the query optimizer can only perform the tricks programmed into it.

For instance, we use SQLAnywhere at work (migrating to MSSQL), and it wasn't smart about IN (sub-query) so EXISTS was much faster.

Or, as I mentioned in another comment here, MSSQL performs much worse using a single OR in WHERE clause vs splitting into two queries and using UNION ALL, something which has no significant difference in SQLAnywhere.

For MSSQL I've found that even a dozen sub-queries in the SELECT part can be much faster than a single CROSS APPLY for fetching per-row data from another table.

Also the query might rely on certain assumptions that will in practice always hold in that application, but not in general. Especially around NULL, for example NOT IN vs NOT EXISTS[1].

[1]: https://www.mssqltips.com/sqlservertip/6013/sql-server-in-vs...

link

Scene_Cast2 835 days ago

One example is that in Presto, joins assume that left (IIRC) table is the big one you stream, and the right one is the small one you hash. One of the newer features was that in some cases, the optimizer is able to ignore the SQL order and correctly pick which table to hash and which one to stream.

link

legulere 835 days ago

Isn't that just basic table/index statistics to know which table is the smallest?

link

Scene_Cast2 835 days ago

Not if you're joining on a filtered table or a result of another join.

link

arronax 835 days ago

> Of course there are "bad queries" where you query for things you don't need, join on the wrong columns, etc.

You can inject a hint into the query, forcing it to use a plan that would not otherwise be used, for example. Although, fixing a plan through a baseline is way cleaner. Mostly, I just meant that as an extreme example of something you can do, not something you should do. And yes, the only reason to re-write the query is when the query itself is bad in that it asks for unnecessary data or misses a join column. Admittedly, that's an extremely dirty and dangerous thing to do, as it uncouples app from db, but it is possible.

link

atwebb 835 days ago

>Is there really such a thing as a bad query that can be rewritten to give the same results but faster? For me, that's already the query optimizer's job.

I can't tell if your disclaimer covers it but, yes, there are lots of bad queries that take a little bit of a re-write and run significantly faster. Generally it is someone taking a procedural vs set based approach or including things they don't need to try and help (adding an index to a temp table when it is only used once and going to be full scanned anyways). That's outside the general data typing/generally missing indexes.

link

Tostino 836 days ago

The problem of new indexes messing up otherwise good queries is something I've battled on and off for the past decade with Postgres. Definitely annoying.

link

rand_r 835 days ago

How would an index mess up another query? AFAIK indexes would only hurt write performance marginally per index, but most slow queries are read-only. I’ve tended to just add indexes as I go without thinking about it and haven’t run into issues, so genuinely curious.

link

adamcharnock 835 days ago

While I don’t recall running into issues either, I can certainly see that a new index could cause the query planner to make a different decision. And that decision could - in some cases - end up being worse that the previous behaviour.

I definitely have seen the query planing make some peculiar choices in the past.

link

Tostino 834 days ago

This is what I ran into. Often times they were indexes with a similar cost as another, and that caused issues.

I think the main index type that bit me are the ones created by exclusion constraints. Often times it looks to the planner like "the right" index to use, but there is another (btree) that is way cheaper...the exclusion constraint is just there to ensure consistency.

In those cases to fix things, I added a WHERE clause to the index (e.g. WHERE 1=1), and the planner wouldn't consider that index unless it saw that same 1=1 condition in the queries WHERE clause.

link

paulryanrogers 835 days ago

Indexes cost memory. It could push other indexes out of RAM. Or updates could increase IOPS, pushing them over an expensive threshold.

link