|
That's a really good question! (and one we should probably answer explicitly in the [FAQ](https://prql-lang.org/faq/) rather than just implicitly) The README states that
"PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement. Like SQL, it's readable, explicit and declarative. Unlike SQL, it forms a logical pipeline of transformations, and supports abstractions such as variables and functions. It can be used with any database that uses SQL, since it transpiles to SQL." What that means to me is that PRQL more naturally maps onto how I think about and work with data. Say I have some dataset, `employees`, and I want to answer some questions about it like, for US employees, what is the maximum and minimum salary and how many employees are there: from employees
filter country == "USA" # Each line transforms the previous result.
aggregate [ # `aggregate` reduces column to a value.
max salary,
min salary,
count, # Closing commas are allowed :)
]
Moreover, after each line you have a valid pipeline which you can transform further by adding more steps/lines to your pipeline. This matches more closely how people construct data pipelines in R using dplyr/tidyverse and in Python using Pandas.If you find that it doesn't map well onto how you think about data pipelines then please let us know as we're constantly looking for more real world examples to help us iterate on the language! |
Do you think the SQL complied by PRQL could be as effective and optimized by database engine as the direct-written SQL?