Hacker News new | ask | show | jobs
by dicroce 2923 days ago
so I was trying to figure out why a query was slow the other day... it was a nasty query with like 14 joins... I used explain and saw that it was a mess... now in my case I was able to switch to outer joints and nest related joins and got it fast.. but I had some interesting thoughts.

In SQL, indexes are implicit.. they are used if available but it's easy get a large query to scan sometimes when it shouldnt... what if there was a different query language with explicit index syntax.. I think you'd get a lot more predictable performance.

4 comments

Two reasons to not have indexes in the query:

1. Query expresses the result it produces, not the method that was used to obtain it. Semantic vs implementation. It may be a pain to write, but it will be easier to read later.

2. DBA could add/drop indexes on the fly to tune performance of a live system without making any changes to the application code. And being 100% certain he is not changing the semantics of what's going on.

As others noted, if you must you can use query hints for force particular index to be used for a particular operation. MSSQL also allows to pin down a query plan you like for a given query so that it doesn't drift away later due to environment changes.

I agree it is sometimes a pain to force SQL to use the index you wanted it to use.

I’ve never worked anywhere where I had to worry about DBAs running around dropping indexes. The main reasons not to build an index are usually storage and write overhead. Every index a table has means you have to do another write operation on every insert, which can really start to add up. They can also add significant overhead to any migration operation that happens to require an index rebuild.

In my experience, the most common reason for an optimizer choosing not to use an existing index, is out of date statistics. For those who aren’t aware, the database collects table statistics for things like cardinality, number of distinct values, etc... This is the information the optimizer uses when it’s building a plan. If they get out of date the optimizer will start to come up with nonsense plans. Even worse, if your stats get too out of date, you can become scared to update them, because a new set of stats can potentially change the plans built for every single query in ways that are hard to predict.

As others have stated, you can put index hints directly into your queries, but this should be avoided as they’re hard to maintain. Most ‘enterprise’ RBDMS also have some form of plan management, but this should be avoided even more, as managed plans permanently bypass the optimizer, which is even harder to maintain.

So, the way I think this should work is that there should be a way of addressing the table sources in a query from an application that has them parsed, and then externally (i.e., in the application) provide planning hints to the compiler.

Something like (in some terrible pseudocode):

    q = parse_query("...");
    q.hint(FIRST_TABLE, "a");
    q.hint(INDEX, "b", "b_idx1");
    c = q.compile();
    r = c.run(...);
MySQL has or had a way of forcing index use, and while it was very occasionally a life-safer, it was much less useful than you might think, as often when a DB engine falls back to sequential scan, it's for a reason (e.g. the query planner might have found that the indexes don't cover the columns you need, and the number of indexed lookups into the table that is needed are costly enough that a sequential scan might end up being faster, for example).

It was useful occasionally "back in the day" when MySQL's query planner was really bad, but today it will mostly appear useful if there's something subtly wrong with your config. I don't use MySQL much any more, but on Postgres one typical mistake might be that the costs configured for the query planner doesn't match your hardware (e.g. if your seek cost is configured to be high enough relative to sequential reads, sequential scans starts to look really good even when you need only a small portion of the data; principle will be the same on MySQL but I don't know if you have the same control over the query planner costs or not).

> what if there was a different query language with explicit index syntax..

There is, it's a feature in MySQL called Index Hints [1].

[1] https://dev.mysql.com/doc/refman/8.0/en/index-hints.html

Or oracle has it

A DBA can even sit there as queries fly past, and add hints on the fly.

And then you change a query from "select" (lowercase) to "SELECT" (uppercase), and query plans break and you break production.

Fun times

Could you elaborate why changing select from lowercase to uppercase would break anything?
Stored procedures where master and slave are on different OSs and/or case-sensitivity settings are not the same?

Just a guess.

I would suggest that there's potentially something you need to look at with your database schema - a couple dozen joins shouldn't be causing any problems you have to think about.

Part of this is because of the way a well-normalized database is organized. Most databases have a few large tables and many smaller tables. So in the general case, most of your joins will be against smaller tables. Joins with larger tables are usually very fast, as long as the fields you join on are indexed (and you're not doing a CROSS JOIN or something.) The other thing that helps (which it sounds like you did by "nesting related joins") is to always think about limiting (filtering) the datasets you're joining against at as many stages as possible; that way you're always doing the least amount of work necessary, and it's usually conceptually simpler to read and understand.

As others have said, most databases do have index hinting as part of the query language. However, in my (long) experience, you should almost never use it. Index hints should be a huge code smell.

I use UUIDs as primary keys, you insensitive clod!