Hacker News new | ask | show | jobs
by jhallenworld 2076 days ago
I don't like either syntax. Wikipedia has this example:

QUEL:

    range of E is EMPLOYEE
    retrieve into W
    (COMP = E.Salary / (E.Age - 18))
    where E.Name = "Jones"
SQL:

    select (e.salary / (e.age - 18)) as comp
    from employee as e
    where e.name = "Jones"
I would prefer an operator syntax that directly mimics relational algebra. Something like:

    w = employee(name == "Jones")[comp = salary / (age - 18)]
So () is "where", [] is "project" (choose or create columns) and you can use * for join and + for union. The result is a table with a column named comp.
7 comments

They should implement something using straight functions, extremely spartan with no special syntax at all, and let everyone build their own favoured DSL over the top.

One of my major complaints about SQL is the syntax is so finicky that it is really hard to replace it with a [something -> sql] layer, because the something layer can't generate all the silly syntactic forms that SQL uses.

Eg, personal favourite, it is easy to have a dsl that translates

  select(y = fn(x)) -> select fn(x) as y
that then breaks down because it can't construct

  ??? -> select extract(month from x) as y
and that is the only syntax the SQL database decided to understand. There are too many cases like that that need special handling, especially once SQL dialect-specific stuff comes into play.
It should be noted that the expression like

    comp = salary / (age - 18)
is (strictly speaking) not part of the relational algebra because it is not using set operations (like join or union). It was added to the relational model because we hardly can process data without such expressions.

A better way to formally describe such calculated columns is to introduce functions and treat them as first class elements of the data model. In particular, function operations are better than set operations in these cases [1]:

- Calculating data using calculated columns. We describe a calculated column as a function without generating new relations (as opposed to select-from)

- Aggregating data. We can add an aggregate column without generating new relations (as opposed to groupby)

- Linking data. We can add derived link columns without generating new relations (as opposed to join)

This function-based approach to data modeling and data processing was implemented in Prosto [2] which demonstrates how many typical relational tasks can be solved using functions.

[1] Why functions and column-orientation? https://prosto.readthedocs.io/en/latest/text/why.html

[2] Functions matter! No join-groupby, No map-reduce. https://github.com/prostodata/prosto

One major strength of SQL is its readability - it reads so much like English that a non-technical stakeholder could conceivably understand queries. Do you not find that this is a valuable thing that's lost with relational-algebra-esque syntax?
I haven’t one time in my 20 years of development had a time where that readability mattered tho. SQL very quickly becomes too complex for people who don’t intimately understand the language to make any sense out of it. Show a layman an INNER JOIN and see if they can make any sense out of what’s happening...they’ll just do what they do in real life: ask an engineer.
I'm in full agreement. Trivial SQL is somewhat readable, but anything non-trivial is not, and the order of SQL queries, that there is essentially no relationship between the syntactic and semantic orderings, make it hard to understand.

QUEL is better there but still not great unless its execution semantics are different than SQL's (aka semantically does it filter before or after selection?)

Relational algebra notation is so much more concise and readable. We’ve mangled so many very simple concepts because someone decided that math is offensive or something, so any math notation is automatically “not fit for software.”

Computing is inherently tied to math. There’s no getting away from that.

There is a book, Applied Mathematics for Database professionals that uses a terse mathematical notation to describe queries and I remember actually kind of liking it. I especially like that they used it to describe the state of the database and the transforms.
This line of thinking is how you get APL and its offspring, J & k etc.

(and just to be clear, I'm neither for or against)

Unbiased samples should match typography

QUEL:

    range of e is employee
    retrieve into w (comp = e.salary / (e.age - 18))
    where e.name = "Jones"
SQL:

    select
    (E.Salary / (E.Age - 18)) as COMP
    from EMPLOYEE as E
    where E.Name = "Jones"

Now QUEL looks like modern language while SQL is jarred mess.
Also `into w` is

> To store the results of the retrieve in a new table, specify `into tablename` [1].

matching SQL:

    create table w as
    select (e.salary / (e.age - 18)) as comp
    from employee as e
    where e.name = "jones"
[1] http://docs.huihoo.com/ingres/9.3/QUELRef.pdf
I would reorder round and square brackets, since I may want to filter on computed/created columns, and the ordering makes it clearer.
It's up to you (and the query optimizer can reorder), but you have to make sure the column is available when you use it. Broken up:

I had:

    a = employee(name == "Jones")    a has name, salary and age

    w = a[comp = salary / (age - 18)]   w has comp
You want:

    b = employee[name, comp = salary / (age - 18)]   b has name and comp

    w = b(name == "Jones" && comp > 500)   w has name and comp
In one line:

    w = employee[name, comp = salary / (age - 18)](name == "Jones" && comp > 500)
I'm not sure if you misunderstood them or I misunderstood you, but there's no legitimate reason you can't filter on computed columns as is:

  w = employee(name == "Jones")[comp = salary / (age - 18)](comp > 2000)
That sounds like a cool language.