Hacker News new | ask | show | jobs
by alquemist 2076 days ago
> … The language (SQL) is not very composable. This is a fact that most SQL users are not aware of. The relational algebra that SQL is based on is absolutely composable but SQL is not due to the inherent limitation of the language (as it was designed to be natural language-like). When you write "select x from a where z", you are actually building something along the lines of "from a" => "where z" => "select x" in the algebra and you can actually compose each portion separately. If you are familiar with dplyr, Spark or pandas you would get this instantly.

Hmmm. Not a big lover of SQL, but this is a bit imprecise. While the unit of composability is slightly smaller for from=>where=>select (expression) vs select+from+where (subquery), in practice they both encode the same fundamental compositional principles, based on relational algebra. Any from=>where=>select query can be translated into a select+where+from query almost 1:1, if only via:

    from t ::= select * from t
    q => where p ::= select * from q where p
    q => select xs ::= select xs from q 
Sometimes the select+where+from ends up more verbose, sometimes there is more brain twisting to grok a given select+where+from query, but that's not a composition limiting factor. Granted, the readability of SQL is sometimes lacking, but it is fully capable to compose recursive relational algebra queries.
3 comments

You're arguing about capability, whereas the post is arguing about the grammar itself. What you're describing is how to build a separate domain specific language that compiles to SQL (which there are quite a few of; e.g. C#'s Linq).

Quite a few query languages are equally capable (including, surprisingly, quite a few so-called graph languages, provided the SQL dialect provides a transitive closure), but SQL as a language has some undesirable properties (most of which are trade-offs for the fact that basic SQL is very easy to parse).

This argues that composability is the most important consideration of a domain specific language. But I think, as proved by SQL taking over, the UX is more important. Any programming language must consider the programmer and its humanity and natural way of thinking and reasoning to win in getting the most adoption and mindshare. Usability does matter as the user of any programming language is a human being.

Was QUEL also more usable and natural to people? As well as being more amenable to composition?

There are a handful of examples on Wikipedia: https://en.wikipedia.org/wiki/QUEL_query_languages. One example:

    retrieve (a=count(y.i by y.d where y.str = "ii*" or y.str = "foo"), b=max(count(y.i by y.d))) 
Not a particularly clear 'jumps at you' obvious semantic:

* Are a and b aggregation functions or window functions? If aggregations, how do they compose if the 'by' scopes are different?

* What does max(count(... by ...)) mean? What is the aggregation (window?) scope of max?

* How would an outer where clause compose? What is the evaluation order?

> The following table lists aggregate functions:

> count() Number of entries in column

> max() Maximum value in column

> The by clause causes the function to return a set of results, as opposed to a single result. One result is returned for each grouping specified by the by clause. Think of by as meaning "for each."

I assume it evaluates like retrieving set and scalar.

      a   |   b
    --------------
    set 1 | scalar
    set 2 | scalar

[1] http://docs.huihoo.com/ingres/9.3/QUELRef.pdf
Assuming we key a and b by y.d and generalizing b to sets, we are reaching the limits of the relational model: there is no good way to represent multiple sets associated with a given key, we need independent tables to do so. Neither cross product nor null padding is a good way to represent the schema {y, a[], b[]}.

      y.d |  a   |  b
    -------------------
      y0  |  a0  |  b0
      y0  |  a0  |  b1
      y0  |  a1  |  b0
      y0  |  a1  |  b1
      y0  |  a2  |  b0
      y0  |  a2  |  b1

    
      y.d |  a   |  b
    -------------------
      y0  |  a0  |  b0
      y0  |  a1  |  b1
      y0  |  a2  |  null

    # create table y (i int, d int);
    # insert into y values (1, 1), (1, 2), (2, 1);
    # select *, (select count(*) from (select count(y.i), y.d from y group by y.d) _) as foo
      from (select count(y.i), y.d from y group by y.d) _ ;
    
     count | d | foo 
    -------+---+-----
         1 | 2 |   2
         2 | 1 |   2
    (2 rows)
    
By the way great example how unwieldy SQL is. A bit better with CTE:

    # with bar as (select count(y.i), y.d from y group by y.d)
      select *, (select count(*) from bar) as bar from foo;
Nice! Should be easy to do max instead of count. Then top instead of max. Then top3 breaks the model.
Subqueries, named VIEWs, CTEs, etc all make SQL compostable ?
Yes, but it's extremely clunky. The number of times I've had to write out a whole chain of CTEs just because I wanted to apply one window function to the output of another.

In which context, "compostable" is a fantastic Freudian typo.

Many of the modern constructs weren’t there in 1984’s SQL, which this story describes; they were added later to address composability issues.