Hacker News new | ask | show | jobs
by wackget 1003 days ago
Yeah the syntax comparison is deliberately misleading.

They style it as "4 lines vs 10 lines!" when it's actually 4 lines vs 4 lines.

   # PRQL
   from employees
   select {id, first_name, age}
   sort age
   take 10

   # Misleading SQL
   SELECT
     id,
     first_name,
     age
   FROM
     employees
   ORDER BY
     age
   LIMIT
     10

   # Actual SQL
   SELECT id, first_name, age
   FROM employees
   ORDER BY age
   LIMIT 10
The join example is similarly deceptive:

   # PRQL
   from employees
   join b=benefits (==employee_id)
   join side:left p=positions (p.id==employees.employee_id)
   select {employees.employee_id, p.role, b.vision_coverage}

   # Misleading SQL
   SELECT
     employees.employee_id,
     p.role,
     b.vision_coverage
   FROM
     employees
     JOIN benefits AS b ON employees.employee_id = b.employee_id
     LEFT JOIN positions AS p ON p.id = employees.employee_id

   # Actual SQL
   SELECT employees.employee_id, p.role, b.vision_coverage
   FROM employees
   JOIN benefits b USING employee_id
   LEFT JOIN positions p USING employee_id
Nonsense.
5 comments

It is also (for me) quite a bit noisier and less readable than SQL. The join example has multiple different syntactic symbols, for no clear intuitive reason (= vs ==, {} vs (), side:left vs p=positions).
I actually like it; it makes sense at a glance, without having read a single line of the official documentation. But I think people should be free to use either syntax -- method or query -- as they see fit, without judgment of which one is better.
[PRQL dev here]

I strongly think we should have the best examples of SQL to compare against. I've ironically made this complaint for other libraries, so I'm alarmed that folks think we might have done the same.

We would take a PR for the first example if folks think that's better spacing. (I think the spacing is the only difference?)

For the second — `USING` isn't fully equivalent to `ON`. There are discussions on GH (https://github.com/PRQL/prql/issues/723) as to which we should compile to. In the meantime, we'd take a PR on anything that is equivalent.

> For the second — `USING` isn’t fully equivalent to `ON`.

In terms of portability because its not supported by, e.g., MSSQL, sure.

In terms of its semantics, though:

  t1 [LEFT/RIGHT/INNER] JOIN t2 USING col
Is fully equivalent to:

  t1 [LEFT/RIGHT/INNER] JOIN t2 ON (t1.col == t2.col)
So for a comparison to SQL as used by most RDBMSs (rather than MSSQL specifically), rather than “what should PRQL compile to”, USING is quite appropriate.

It may be that the intent of the homepage pairing is to highlight the actual compilation result and not provide a comparison to SQL-as-it-would-manually-be-written, but the presentation doesn’t make it clear that that’s the purpose.

I posted a link to the issue discussing this on the repo. That gives a good overview to those interested in this issue.

To take one point from there:

> Is fully equivalent to:

They're not fully equivalent — `USING` combines the two columns into a single column, `ON` doesn't.

USING just presumes the columns have the same name on both ends. There is no logical "combining", only a lexical one. It's why many DB admins/engineers use "foo_id" instead of "id" in their schemas as a rule. (Notably, also one of the reasons why many DB admins/engineers hate ORMs and other table generators that name id columns generically as "id".)

The person you are responding to is likely in this cohort. You're correct that they aren't exactly equivalent, but that's an artifact of your table definition, not the language. It likely seemed perfectly reasonable to the commenter that this naming was a trivial detail in the scope of the greater conversation.

    WITH t AS (SELECT 1 x)
    SELECT x FROM t JOIN t u ...;
That breaks if you fill in the ... with "ON t.x = u.x" because there are two columns called "x", but works with "USING (x)" because they get collapsed into a single column. I think it makes sense to say those aren't fully equivalent.
OK, so we agree they're not fully equivalent
> We would take a PR for the first example if folks think that's better spacing. ... we'd take a PR on anything that is equivalent.

"We'd take a PR" is a great line for someone who's already sold on the concept, but it's not super helpful for flaws in your marketing. Reading OP, it doesn't sound like they trust you enough to find it to be worth their time fixing your examples for you.

Yes, this is fair.

Though I'm not sure how to find consensus on what is the best representation of SQL without asking folks to make a reviewable suggestion.

I had thought the examples are currently in their clearest state — I definitely don't think it's obvious that removing line-breaks makes it look clearer (which is the full extent of the feedback IIUC...)

I opened an issue on this here: https://github.com/PRQL/prql/issues/3518
Also there's no need to capitalise, so the 4 lines vs. 4 lines is actually just 'order by & limit' vs. 'sort & take'. NBD really, but it's probably nicer in more complex examples.

But I wanted about this the last time PRQL was submitted in its own right - https://news.ycombinator.com/item?id=36869376 - so I'll leave it there!

Looks like they are showcasing the output of the compiler, not the hand written SQL.
PRQL is a solution in search of a problem. Lacking such, the problem had to be fabricated.
SQL has many problems that PRQL ostensibly addresses. Query reuse, composability, abstraction and more.
Query reuse, composability and abstraction are all sufficiently solved with views in my experience.
Then your experience doesn't extend to dynamic queries composed of query fragments, which is almost universal to all e-commerce sites.

Views are a degenerate, second-class form of query abstraction, as are common table expressions and the numerous other features added to SQL over the years to paper over the lack of proper first-class query abstractions.

It's like saying a programming language doesn't need first-class functions because we have disjoint unions and we can perform defunctionalization by hand. Strictly true, but really missing the point.

When viewed as a DSL for set theory, views, CTEs, set-returning functions, et al are indeed proper first-class query abstractions.

When viewed through the lens of general purpose imperative or functional programming languages, it's easy to see how it can be seen as falling short.

I'll admit much of the tooling and driver APIs leave a lot to be desired.

Some tools do make good efforts though such as nested fragments in this driver.

https://github.com/porsager/postgres#building-queries

> When viewed as a DSL for set theory, views, CTEs, set-returning functions, et al are indeed proper first-class query abstractions.

It's still lacking even then. You can't assign a set to variable for instance, and then use that variable in other definitions or queries.