| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dspillett 637 days ago

On readability, I often find aligning things in two columns is more readable. To modify the two examples in TFA:

    SELECT e.employee_id
         , e.employee_name
         , e.job
         , e.salary
      FROM employees e
     WHERE 1=1 -- Dummy value.
       AND e.job IN ('Clerk', 'Manager')
       AND e.dept_no != 5
         ;

and with a JOIN:

    SELECT e.employee_id
         , e.employee_name
         , e.job
         , e.salary
         , d.name
         , d.location
      FROM employees e
      JOIN departments d
           ON d.dept_no = e.dept_no
     WHERE 1=1 -- Dummy value.
       AND e.job IN ('Clerk', 'Manager')
       AND e.dept_no != 5
         ;

In the join example, for a simple ON clause like that I'll usually just have JOIN ... ON in the one line, but if there are multiple conditions they are usually clearer on separate lines IMO.

In more complicated queries I might further indent the joins too, like:

    SELECT *
      FROM employees e
           JOIN departments d
             ON d.dept_no = e.dept_no
     WHERE 1=1 -- Dummy value.
       AND e.job IN ('Clerk', 'Manager')
       AND e.dept_no != 5
         ;

YMMV. Some people strongly agree with me here, others vehemently hate the way I align such code…

WRT “Always specify which column belongs to which table”: this is particularly important for correlated sub-queries, because if you put the wrong column name in and it happens to match a name in an object in the outer query you have a potentially hard to find error. Also, if the table in the inner query is updated to include a column of the same name as the one you are filtering on in the outer, the meaning of your sub-query suddenly changes quite drastically without it having changed itself.

A few other things off the top of my head:

1. Remember that as well as UNION [ALL], EXCEPT and INTERSECT exist. I've seen (and even written myself) some horrendous SQL that badly implements these behaviours. TFA covers EXCEPT, but I find people who know about that don't always know about INTERSECT. It is rarely useful IME, but when it is useful it is really useful.

2. UPDATEs that change nothing still do everything else: create entries in your transaction log (could be an issue if using log-shipping for backups or read-only replicas etc.), fire triggers, create history rows if using system-versioned tables, and so forth. UPDATE a_table SET a_column = 'a value' WHERE a_column <> 'a value' can be a lot faster than without the WHERE.

3. Though of course be very careful with NULLable columns and/or setting a value NULL with point 2. “WHERE a_column IS DISTINCT FROM 'a value'” is much more maintainable if your DB supports that syntax (added in MS SQL Server 2022 and Azure SQL DB a little earlier, supported by Postgres years before, I don't know about other DBs without checking) than the more verbose alternatives.

4. Trying to force the sort order of NULLs with something like “ORDER BY ISNULL(a_column, 0)”, or doing similar with GROUP BY, can be very inefficient in some cases. If you expect few rows to be returned and there are relatively few NULLs in the sort target column it can be more performant to SELECT the non-NULL case and the NULL case then UNION ALL the two and then sort. Though if you do expect many rows this can backfire badly and you and up with excess spooling to disk, so test, test, and test again, when hacking around like this.