Hacker News new | ask | show | jobs
by snthpy 1460 days ago
That's a really good question! (and one we should probably answer explicitly in the [FAQ](https://prql-lang.org/faq/) rather than just implicitly)

The README states that "PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement. Like SQL, it's readable, explicit and declarative. Unlike SQL, it forms a logical pipeline of transformations, and supports abstractions such as variables and functions. It can be used with any database that uses SQL, since it transpiles to SQL."

What that means to me is that PRQL more naturally maps onto how I think about and work with data.

Say I have some dataset, `employees`, and I want to answer some questions about it like, for US employees, what is the maximum and minimum salary and how many employees are there:

    from employees
    filter country == "USA"                       # Each line transforms the previous result.
    aggregate [                                   # `aggregate` reduces column to a value.
      max salary,
      min salary,
      count,                                      # Closing     commas are allowed :)
    ]

Moreover, after each line you have a valid pipeline which you can transform further by adding more steps/lines to your pipeline. This matches more closely how people construct data pipelines in R using dplyr/tidyverse and in Python using Pandas.

If you find that it doesn't map well onto how you think about data pipelines then please let us know as we're constantly looking for more real world examples to help us iterate on the language!

2 comments

One benefit of SQL is that the Database Engine will do the hard work of optimizing the query plan.

Do you think the SQL complied by PRQL could be as effective and optimized by database engine as the direct-written SQL?

As you said, let the Database Engine do the hard work of optimizing the query plan for you.

I currently have no reason to believe that the PRQL generated SQL would be any worse than hand written SQL. That said, I don't think we've currently looked at any ways of passing hints to the query planner. We're always open to suggestions!

In the worst case, you have full access to the generated SQL, and for absolutely crucial queries you can hand modify that SQL. At least PRQL might have saved you the trouble of writing a cumbersome window function or something like that (see for example the example of picking the top row by some GROUP BY expression).

This reminds me of KUSTO I'm not sure how it compares to SQL in general. But it was really fun to work with for querying Azure application insigts