Hacker News new | ask | show | jobs
by maximilianroos 1003 days ago
[PRQL dev here]

I strongly think we should have the best examples of SQL to compare against. I've ironically made this complaint for other libraries, so I'm alarmed that folks think we might have done the same.

We would take a PR for the first example if folks think that's better spacing. (I think the spacing is the only difference?)

For the second — `USING` isn't fully equivalent to `ON`. There are discussions on GH (https://github.com/PRQL/prql/issues/723) as to which we should compile to. In the meantime, we'd take a PR on anything that is equivalent.

2 comments

> For the second — `USING` isn’t fully equivalent to `ON`.

In terms of portability because its not supported by, e.g., MSSQL, sure.

In terms of its semantics, though:

  t1 [LEFT/RIGHT/INNER] JOIN t2 USING col
Is fully equivalent to:

  t1 [LEFT/RIGHT/INNER] JOIN t2 ON (t1.col == t2.col)
So for a comparison to SQL as used by most RDBMSs (rather than MSSQL specifically), rather than “what should PRQL compile to”, USING is quite appropriate.

It may be that the intent of the homepage pairing is to highlight the actual compilation result and not provide a comparison to SQL-as-it-would-manually-be-written, but the presentation doesn’t make it clear that that’s the purpose.

I posted a link to the issue discussing this on the repo. That gives a good overview to those interested in this issue.

To take one point from there:

> Is fully equivalent to:

They're not fully equivalent — `USING` combines the two columns into a single column, `ON` doesn't.

USING just presumes the columns have the same name on both ends. There is no logical "combining", only a lexical one. It's why many DB admins/engineers use "foo_id" instead of "id" in their schemas as a rule. (Notably, also one of the reasons why many DB admins/engineers hate ORMs and other table generators that name id columns generically as "id".)

The person you are responding to is likely in this cohort. You're correct that they aren't exactly equivalent, but that's an artifact of your table definition, not the language. It likely seemed perfectly reasonable to the commenter that this naming was a trivial detail in the scope of the greater conversation.

    WITH t AS (SELECT 1 x)
    SELECT x FROM t JOIN t u ...;
That breaks if you fill in the ... with "ON t.x = u.x" because there are two columns called "x", but works with "USING (x)" because they get collapsed into a single column. I think it makes sense to say those aren't fully equivalent.
OK, so we agree they're not fully equivalent
> We would take a PR for the first example if folks think that's better spacing. ... we'd take a PR on anything that is equivalent.

"We'd take a PR" is a great line for someone who's already sold on the concept, but it's not super helpful for flaws in your marketing. Reading OP, it doesn't sound like they trust you enough to find it to be worth their time fixing your examples for you.

Yes, this is fair.

Though I'm not sure how to find consensus on what is the best representation of SQL without asking folks to make a reviewable suggestion.

I had thought the examples are currently in their clearest state — I definitely don't think it's obvious that removing line-breaks makes it look clearer (which is the full extent of the feedback IIUC...)

I opened an issue on this here: https://github.com/PRQL/prql/issues/3518