Hacker News new | ask | show | jobs
by exabrial 2076 days ago
Really the the only issue I have with SQL is NULL != NULL. This creates an impedance mismatch with most languages...

MySQL sort of solves this problem with a <=> operator, which I wish was the default for ORMs to use.

There are a lot of other minor nitpicks but a lot of criticisms come down to the actual RDMS not SQL itself.

8 comments

Here are some major issues [1]:

SQL’s shortcomings can be grouped into these categories:

- lack of proper orthogonality — SQL is hard to compose;

- lack of compactness — SQL is a large language;

- lack of consistency — SQL is inconsistent in syntax and semantics;

- poor system cohesion — SQL does not integrate well enough with application languages and protocols.

[1] We Can Do Better Than SQL: https://www.edgedb.com/blog/we-can-do-better-than-sql/

I consider myself a fan of SQL, but my nitpick is more around named calculated columns. For instance:

    SELECT AVG(col1) OVER (PARTITION BY col2) AS partcol1
    FROM tbl
    WHERE AVG(col1) OVER (PARTITION BY col2) > 10.0
I wish I could just do:

    SELECT AVG(col1) OVER (PARTITION BY col2) AS partcol1
    FROM tbl
    WHERE partcol1 > 10.0
But I can't, because the WHERE clause is processed before the SELECT clause.

So if I have a bunch of these and want a convenient way to work with the named columns, I have to wrap them into a common table expression:

    WITH cte AS (
        SELECT AVG(col1) OVER (PARTITION BY col2) AS partcol1
        FROM tbl
    )
    SELECT partcol1
    FROM cte
    WHERE partcol1 > 10.0
So wordy.
Yeah it's mentioned in this blog post https://blog.jooq.org/2014/01/06/the-sql-languages-most-miss... that SQL is missing Common Column Expressions, and that using Common Table Expressions reuse columns is simply a band-aid fix over the lack of those.

For example, your example would have been more naturally expressed as

    SELECT partcol1
    FROM (tbl WITH partcol1 AS AVG(col1) OVER (PARTITION BY col2))
    WHERE partcol1 > 10.0
Wow thanks. I've never heard it expressed as a Common Column Expression. It makes sense to include it in the FROM statement because that's usually parsed and executed first.
My bigger concern regarding NULLs is that its a ternary logic shoved into a binary logic system, and it all invisibly becomes nonsense when your dataset has NULLs in it, and you don't explicitly address it

WHERE col1 > col2

is wrong, and it'll break in terrible ways

and in the face of negation + NULLs, everything falls apart[0], giving you both false positive and false negatives in your answerset, and everything will look perfectly fine.

Every column being made NOT NULL is the only sane solution.

[0] https://databasetheory.org/sites/default/files/2018-05/03_pr...

This is basically the correct take, but I would say that NULL is a real and useful concept which databases should have.

That implies that comparison and boolean operators should be ternary, which would be ugly and confusing: but an ugly and confusing that reflects reality.

This would give us three "greater than" operators:

   WHERE col1  > col2
   WHERE col1 ?> col2
   WHERE col1 >? col2
The first is always false for NULL, the second always picks the NULL column, the third never picks the NULL column. It doesn't seem coherent to order two NULLs, so that would always be false. I'm not attached to the syntax, which is intended to be illustrative.

The most important of such operators would be

   WHERE col1 ?= col2
which coerces two NULLs to be equivalent. It says "yes NULLs aren't comparable, but for this query, I want to treat them as equal". Because that is only usually true, not invariably so.
you're not wrong... Null by itself is sort of crazy in a binary system, but I see that as more a problem with binary representing reality rather than the other way around. It's a handy abstraction in that sense but a nightmare for systems level programming.
The concept is useful, the implementation is not. You really just want better ergonomics for enums, and encode the many ways NULL is meant to mean that way
I just wish they had flipped from and select around.
Or "set" and "where"
It's not about flipping. The two concepts are actually commutative. You could in theory create syntax that looks like this:

  FROMCLAUSE * SELECTCLAUSE * WHERECLAUSE = SQLEXPRESSION
  SELECTCLAUSE * WHERECLAUSE * FROMCLAUSE = SQLEXPRESSION
  ...
The issue is that not only does SQL syntax force an artificial order on these clauses, but that these clauses Cannot be decomposed to be used elsewhere. I cannot reuse a WHERECLAUSE or a SELECTCLAUSE in another expression.
Some detail here. What goes on in relational algebra is that the FROMCLAUSE is encoded into an axiomatic primitive called a RELATION and you get stuff like this:

   SELECTCLAUSE(WHERECLAUSE(relation)) = relation
   WHERECLAUSE(SELECTCLAUSE(relation)) = relation
Basically every operation in relational algebra produces a primitive of type RELATION which allows for all operations to be composed like unix pipes.

My example in the previous post has a flaw where it's not clear what:

   SELECTCLAUSE * WHERECLAUSE = ????
will output because there's no meaning to a SQLEXPRESSION without a FROMCLAUSE. But hopefully it illustrates the point. I'm putting this here for anyone who's nitpicky about the details. The relational algebra syntax is much more elegant.
Pascal's book [1] highlights various issues with the language & is worth reading if you are not familiar with it.

Example:

select * from customers c inner join orders o on o.order_id = c.customer_id

Legal syntax but clearly incorrect.

[1] https://books.google.co.uk/books?id=t9ZQAAAAMAAJ&source=gbs_...

Do you mean “incorrect” in the sense that comparing a customer_id to an order_id is semantic nonsense?

I agree, and I do wish SQL had stronger typing so the parser could warn you before your query silently runs off the rails. For example, in Oracle, I believe the following is legal, but I wish it wasn’t:

   select * 
   from my tab 
   where 1 = ‘1’
Next you're going to complain about NaN != NaN.
The SQL standard also defined an operator for that: IS DISTINCT FROM
What about the main point of the article that SQL is not composable.
CTEs (common table expressions) and views definitely do help with this, though they are new-ish where they exist and often have optimization issues. But being able to use them extensively in a newer database where they work well helps this quite a bit.
Tables are composable but the expression itself cannot be decomposed. I cannot reuse a where clause somewhere else; that is the fundamental problem the article addresses.