Hacker News new | ask | show | jobs
by faho 1608 days ago
One very very simple fix is to mention the table first:

FROM this SELECT whatever

This already allows autocomplete for the attributes to work, and has an easier mental model - you think about the tables, then you think about their attributes. It also matches relational algebra better, where you'd do the projection (picking the attributes you want) at the end.

But anyway, simple cases being simple doesn't mean the language isn't horrible for more complex ones.

One thing I always complain about is join clauses making it easy to do the wrong thing (NATURAL JOIN) and annoying to do the correct thing (joining on the defined foreign keys).

7 comments

People have long joked about "yoda conditionals" ("if 5 == x", for example, instead of "if x == 5"), and flipping it in that order is the same thing: SELECT first is the same as "Get the fork from the drawer", where the flipped order actually sounds like something Yoda would say.
It is more like starting the recipe with "take sauce from point 6 and pour it over meat from point 10".

Very few people has problem with trivial queries like "SELECT x FROM y", but when query contains multiple joins or inner queries then having select at the beginning is visibly problematic.

What I did to get better at 'big' queries was start by writing out things like inner queries as CTEs, Table Parameters, or (if in oracle, lol) refCursors.

That's (sometimes!) less performant than the one big query, but you can then refactor into a single query if you so choose.

Yeah, it's slow goings at first, but you get pretty good at SQL in the process.

a[5] means "From array a, select element 5", and nobody has a problem with that. If anything, the English habit of describing things in little-endian fashion ("Take a fork from the drawer to the left of the sink in the large kitchen") adds a lot of cognitive overhead because you have to wait for the end to figure out your first step, and then reverse the order. A much more practical way of writing that would be "In the large kitchen, to the left of the sink, there's a drawer; take the fork from the drawer."
It's kind of nice to see which columns you're selecting, as they're always in the beginning
Right, that's a sort of counterpoint. It is nice to know that you're going to be getting a fork so you have context for the kitchen -> sink -> drawer instructions. Similarly, I expect in many cases, when you have a complex query, the fact that you know the result is going to be "userid, sum(score)", it makes the subsequent query easier to understand (as you know where it's got to end up). "Can you get me a fork? In the kitchen, next to the sink, in the drawer" might be even friendlier.
The question is if the syntax correspond to the logical order of operations.

In the query "SELECT foo + 2 FROM bar WHERE baz ORDER BY foo", the logical order is actually "FROM foo WHERE baz SELECT foo + 2 ORDER BY wawa" because of how each clause depends on the previous.

The SQL syntax is neither the logical order nor the direct reverse - it is just a random jumble.

SQL syntax is designed for reading, which is the majority of coding time. What gets selected is generally the thing most cared about so it goes first, where stuff comes from is the next most important so you get FROM and JOIN. Filtering and aggregation are next, because they are often hinted at by the select list anyway, then sorting is the least important so it comes at the end.
> What gets selected is generally the thing most cared about so it goes first

I disagree. When you're reading a query you often already know what you were trying to select, you just have a problem with a clause somewhere. This suggests that the select should go last, and this makes perfect sense as Haskell's comprehensions and C#'s LINQ are both way easier to work with than SQL.

You know what you want to select at the time of writing, but 3 months later when someone else needs to add column foo the first two thoughts are: is it already in the select and is it available to the select without more joins. I've rarely ever touched filtering or aggregating lines in an exiting query unless requirements completely changed, which is less common than wanting more values added to an existing query.
I disagree. I've used C#'s LINQ extensively, where the select is at the end and adding columns is trivial. It's a complete non-issue.

On the other hand having the select at the beginning has all kinds of problems for autocomplete, and syntactically obscures where you're selecting from and the clauses. I recommend you try LINQPad if you want experience with how much better this works:

https://www.linqpad.net/

> I've rarely ever touched filtering or aggregating lines in an exiting query unless requirements completely changed

Requirements change or bugs are discovered in the query. This is far more common than you imply.

As long as you have each operation in the expected order, then SQL syntax might make sense. The problem comes when you need something outside this template, e.g. a projection after an aggregation. The straightforward syntax becomes unnecessary complex and hard to read if you just venture a bit outside of the default template.

SQL syntax is like if all arithmetic expressions had to be addition followed by subtraction followed by multiplication. And if you didn't need to add anything you would just have to add 0.

It would perhaps be a good thing if one could write SQL clauses in their logical ordering; as [1] explains:

* The FROM clause: First, all data sources are defined and joined

* The WHERE clause: Then, data is filtered as early as possible

* The CONNECT BY clause: Then, data is traversed iteratively or recursively, to produce new tuples

* The GROUP BY clause: Then, data is reduced to groups, possibly producing new tuples if grouping functions like ROLLUP(), CUBE(), GROUPING SETS() are used

* The HAVING clause: Then, data is filtered again

* The SELECT clause: Only now, the projection is evaluated. In case of a SELECT DISTINCT statement, data is further reduced to remove duplicates

* The UNION clause: Optionally, the above is repeated for several UNION-connected subqueries. Unless this is a UNION ALL clause, data is further reduced to remove duplicates

* The ORDER BY clause: Now, all remaining tuples are ordered

* The LIMIT clause: Then, a paginating view is created for the ordered tuples

* The FOR clause: Transformation to XML or JSON

* The FOR UPDATE clause: Finally, pessimistic locking is applied

[1] https://www.jooq.org/doc/latest/manual/sql-building/sql-stat...

I recently had to train my new junior to level his SQL skills and I found this resource pretty helpful to make sense of this mess. https://learnsql.com/blog/sql-order-of-operations/

However I do also see the point of "SELECT first" just like a header you can infer the output data structure of a sub-expression without necessarily dive into the meat of it. It require a some brain training, but once you get there it oftentimes easier to navigate 100+ lines scripts by jumping from header to header (usually organized as CTE to make the code cleaner).

> and annoying to do the correct thing (joining on the defined foreign keys).

Maybe you’ve seen this thread already; a proposal with some alternatives on how to improve the situation for joining in foreign key columns, but in case not here is the link:

https://news.ycombinator.com/item?id=29739147

That proposal unfortunately requires you to name the foreign key constraint, which is quite unergonomic.

E.g. instead of the wrong `FROM a NATURAL JOIN b` you would use the correct `FROM a JOIN FOREIGN a.foo_fkey`, which not only needs that second name but now also loses the immediate naming of b. So e.g. autocomplete would have to look up the foreign key constraint to find out the second table. And it's still longer and harder to use than the natural join!

Most databases have one foreign key from a given table to another given table, and that simple case should be made easy to use.

There are multiple alternative syntaxes suggested in the proposal, one of them doesn’t use foreign key names, but instead the foreign key column names, similar to USING (), but without the ambiguity:

https://gist.github.com/joelonsql/15b50b65ec343dce94db6249cf...

This is my biggest, high-level thing too; in the syntax we use SELECT to mean a projection and FROM the selection.
> This already allows autocomplete for the attributes to work.

So does the other way around in several SQL engines.

If you write something a long the lines of select x.ID, y.NAME from bla.bla as x join hum.hum as y on x.ID = y.FK in msSQL you’ll get autocomplete on x. and y..

You’re right that it’s more intuitive to write the from first of course.

You can't possibly get autocompletion on x. and y. for that first select if you didn't write that from clause yet (or at least the autocompletion you'd get would not be tailored to those tables).
If you add the table name you could, and that's what "x." here is.

So yes, you can autocomplete

SELECT employee.Na<TAB>

to "employee.Name", but it requires you to type the table name "employee." first.

But with the from-first style you can autocomplete even bare column names - you know you have "name" (possibly even "employee.name" and "supervisor.name") and "employeeID".

Except that the table name isn't necessarily going to be that x. If you are matching employees with their managers then you have two employees tables in that expression so you have to work with aliases. At which point autocompletion breaks down.
You're familiar with CTE, right ?
Yes, you can work around some inadequacies of SQL by bolting more features on top.

That doesn't mean that the basic design of SQL isn't awkward.