Hacker News new | ask | show | jobs
by snthpy 1003 days ago
Disclaimer: I'm a core contributor to PRQL [1] and post about it a lot on HN. Apologies for jumping in on other people's threads, but for people interested in the headline, PRQL might be of interest.

At PRQL[1] we believe that SQL is a combination of two things:

1. Relational Algebra, which is eternal because it's just maths, and 2. A language designed in the 70s that looks like COBOL.

When people say that SQL will never die, they are usually thinking about Relational Algebra because SQL has been used interchangeably with that. With PRQL we agree that Relational Algebra is fundamental to thinking about data and we intend to keep that. However we've learned a lot about programming languages in the last 50 years and so PRQL is a revamp of SQL that brings the composability of functional languages and modern ergonomics to data transformations in order to improve the DX and UX of data scientists, data analysts and analytics engineers.

PRQL is simply a compiler that produces SQL so you can use it with whatever database you are currently using. It's completely open source with zero commercial associations and is deeply committed to staying that way forever.

1: https://prql-lang.org/

17 comments

The syntax comparison section will likely sour a lot of viewers who already know SQL. You try too hard to highlight how easy and terse PRQL is by putting comma-separated items on their own lines but in SQL you put each item on separate lines.

It may be typical of many SQL users and formatters, but it leaves a poor taste in the mouth that you aren't interested in an actual comparison but in marketing.

For those who already know SQL, the real question is: will it make my queries faster? Putting the FROM first isn't sufficiently compelling on its own. Having a processing pipeline, though marginally more elegant to look at, doesn't actually improve upon CTEs.

When you say you can use it with any database, how do you handle functions, stored procedures, jsonpath, and the massive differences in functionality between Oracle, MS SQL Server, Postgres, DB2, MySQL, MariaDB, H2, SQLite, etc.? Lowest common denominator?

After 49 years of SQL, more than syntax has to change; you need an engine that supports this natively and can actually improve planner behavior over existing engines.

I will grant that if you are limiting your target audience to primarily analytics, it's probably sufficient. The marketing of PRQL doesn't always appear to do this however.

Yeah the syntax comparison is deliberately misleading.

They style it as "4 lines vs 10 lines!" when it's actually 4 lines vs 4 lines.

   # PRQL
   from employees
   select {id, first_name, age}
   sort age
   take 10

   # Misleading SQL
   SELECT
     id,
     first_name,
     age
   FROM
     employees
   ORDER BY
     age
   LIMIT
     10

   # Actual SQL
   SELECT id, first_name, age
   FROM employees
   ORDER BY age
   LIMIT 10
The join example is similarly deceptive:

   # PRQL
   from employees
   join b=benefits (==employee_id)
   join side:left p=positions (p.id==employees.employee_id)
   select {employees.employee_id, p.role, b.vision_coverage}

   # Misleading SQL
   SELECT
     employees.employee_id,
     p.role,
     b.vision_coverage
   FROM
     employees
     JOIN benefits AS b ON employees.employee_id = b.employee_id
     LEFT JOIN positions AS p ON p.id = employees.employee_id

   # Actual SQL
   SELECT employees.employee_id, p.role, b.vision_coverage
   FROM employees
   JOIN benefits b USING employee_id
   LEFT JOIN positions p USING employee_id
Nonsense.
It is also (for me) quite a bit noisier and less readable than SQL. The join example has multiple different syntactic symbols, for no clear intuitive reason (= vs ==, {} vs (), side:left vs p=positions).
I actually like it; it makes sense at a glance, without having read a single line of the official documentation. But I think people should be free to use either syntax -- method or query -- as they see fit, without judgment of which one is better.
[PRQL dev here]

I strongly think we should have the best examples of SQL to compare against. I've ironically made this complaint for other libraries, so I'm alarmed that folks think we might have done the same.

We would take a PR for the first example if folks think that's better spacing. (I think the spacing is the only difference?)

For the second — `USING` isn't fully equivalent to `ON`. There are discussions on GH (https://github.com/PRQL/prql/issues/723) as to which we should compile to. In the meantime, we'd take a PR on anything that is equivalent.

> For the second — `USING` isn’t fully equivalent to `ON`.

In terms of portability because its not supported by, e.g., MSSQL, sure.

In terms of its semantics, though:

  t1 [LEFT/RIGHT/INNER] JOIN t2 USING col
Is fully equivalent to:

  t1 [LEFT/RIGHT/INNER] JOIN t2 ON (t1.col == t2.col)
So for a comparison to SQL as used by most RDBMSs (rather than MSSQL specifically), rather than “what should PRQL compile to”, USING is quite appropriate.

It may be that the intent of the homepage pairing is to highlight the actual compilation result and not provide a comparison to SQL-as-it-would-manually-be-written, but the presentation doesn’t make it clear that that’s the purpose.

I posted a link to the issue discussing this on the repo. That gives a good overview to those interested in this issue.

To take one point from there:

> Is fully equivalent to:

They're not fully equivalent — `USING` combines the two columns into a single column, `ON` doesn't.

USING just presumes the columns have the same name on both ends. There is no logical "combining", only a lexical one. It's why many DB admins/engineers use "foo_id" instead of "id" in their schemas as a rule. (Notably, also one of the reasons why many DB admins/engineers hate ORMs and other table generators that name id columns generically as "id".)

The person you are responding to is likely in this cohort. You're correct that they aren't exactly equivalent, but that's an artifact of your table definition, not the language. It likely seemed perfectly reasonable to the commenter that this naming was a trivial detail in the scope of the greater conversation.

> We would take a PR for the first example if folks think that's better spacing. ... we'd take a PR on anything that is equivalent.

"We'd take a PR" is a great line for someone who's already sold on the concept, but it's not super helpful for flaws in your marketing. Reading OP, it doesn't sound like they trust you enough to find it to be worth their time fixing your examples for you.

Yes, this is fair.

Though I'm not sure how to find consensus on what is the best representation of SQL without asking folks to make a reviewable suggestion.

I had thought the examples are currently in their clearest state — I definitely don't think it's obvious that removing line-breaks makes it look clearer (which is the full extent of the feedback IIUC...)

I opened an issue on this here: https://github.com/PRQL/prql/issues/3518
Also there's no need to capitalise, so the 4 lines vs. 4 lines is actually just 'order by & limit' vs. 'sort & take'. NBD really, but it's probably nicer in more complex examples.

But I wanted about this the last time PRQL was submitted in its own right - https://news.ycombinator.com/item?id=36869376 - so I'll leave it there!

Looks like they are showcasing the output of the compiler, not the hand written SQL.
PRQL is a solution in search of a problem. Lacking such, the problem had to be fabricated.
SQL has many problems that PRQL ostensibly addresses. Query reuse, composability, abstraction and more.
Query reuse, composability and abstraction are all sufficiently solved with views in my experience.
Then your experience doesn't extend to dynamic queries composed of query fragments, which is almost universal to all e-commerce sites.

Views are a degenerate, second-class form of query abstraction, as are common table expressions and the numerous other features added to SQL over the years to paper over the lack of proper first-class query abstractions.

It's like saying a programming language doesn't need first-class functions because we have disjoint unions and we can perform defunctionalization by hand. Strictly true, but really missing the point.

Does it SELECT * by default if I never define a SELECT below my FROM? ... Continuing to encourage folks by allowing them to SELECT * easier is would not be fun for me... I could be wrong?

Agreed, just parsing out the formatting so its "fewer lines" than traditional SQL soured me.

The expressions example is ridiculous, in Redshift I can do this all day?? SELECT 1 + 2 AS num1 , num1 * 2 AS num2 -- Literally no difference

Just learn SQL...

> Just learn SQL...

I know SQL, and I imagine the authors of PRQL know it better than I do.

Doesn't it seem weird that dozens of application languages have become popular since the 1970s, but we're still using dialects of the same old database query language? If it had a really elegant syntax, perhaps it wouldn't, but SQL's syntax is anything but. Some of the semantics can be awkward as well.

I, for one welcome attempts to move things forward (which is different from saying I'm going to run out and use PRQL in production tomorrow).

> Doesn't it seem weird that dozens of application languages have become popular since the 1970s, but we're still using dialects of the same old database query language?

Indeed. Do you honestly believe that a half-century of data storage professionals and vendors are blindly moving forward with a hobbled tool?

Or maybe there are aspects of SQL as a set-oriented 4th generation programming language that aren't apparent to folks who are intimately tied to an imperative or functional programming paradigm as opposed to a declarative DSL for set theory.

Within popular application languages, for any given paradigm, there are almost always several languages that aren't merely dialects of each other. C# isn't a dialect of Java. Ruby isn't a dialect of Python. Rust isn't a dialect of C++.

PRQL demonstrates that a set-oriented declarative language need not be a dialect of SQL and isn't the first language to do so (QUEL appeared in the 1970s). It seems odd to me these alternatives haven't gained much popularity.

> Putting the FROM first isn't sufficiently compelling on its own.

Personally I see that as not even neutral, it's a downside. Optimizing for autocomplete is an antipattern, code is read far more often that it's written and the SELECT clause is the interface to the following code. It should be easy to find when skimming, not buried in the query.

The SELECT clause is also akin to an assignment and it's extremely rare I see anyone advocating flipping the order of those to match what they say they want in SQL.

Edit: Since I'm sure someone is going to jump on it, yes, I'm conflicted about the WITH clause: It's extremely useful and I like what it does, so I do use it, but I don't like where it's positioned. I've been toying with indentation to work around it so SELECT is still just as visible as otherwise.

You might find Malloy interesting as it makes a greater departure from SQL syntax. Queries are first class objects which can be nested within each other in order to do trellising. It still compiles to SQL because that is the only language accepted by DBMSs today; however it will automatically write symmetric aggregate calculations and do those nestings that are hard for a human to write.

https://www.malloydata.dev/

How are views not first class query objects?
Yup.

"SQL doesn't have first-class query objects and isn't composable."

Uh… views, set-returning functions, temporary tables, CTEs…

"No, not like that! Something not declarative and set-oriented!"

SQL is declarative and set-oriented. 3rd generation language models for data were tried and discarded for good reason decades ago.

"But… but… I don't like set-oriented models! They make me feel dumb. I'm not dumb, so SQL must be dumb. Not meeeeeeeee!!!"

[PRQL dev here]

I strongly think we should have the best examples of SQL to compare against. I've ironically made this complaint for other libraries, so I'm alarmed that folks think we might have done the same.

We would take PRs for any improvements to the SQL that make it a better comparison.

Not a criticism, more of an observation.

The point of being able to write "FROM" before "SELECT" has become moot to me since many DBs support that SQL "convention" already.

The newline/indentation is something I would do for readability in the PRQL too.

At the end of the day, I'm becoming less concerned about these various query language syntaxes and more concerned about the logical and physical plans generated by the respective query engines.

In my experience, what ends up most problematic is that each query-engine/optimizer sometimes/usually requires SQL tweaks/nuances to perform as efficiently as possible. That's where reading/writing/maintaining queries can get really confusing/difficult. An abstraction such as PRQL, dataframes, etc might add value here, but only if the abstraction (or runtime/evn built around it) can tune the output SQL to each query engine as it is needed.

There are a lot of projects using frameworks/libraries such as Substrait and SQLGlot to accomplish this.

https://substrait.io/

https://github.com/tobymao/sqlglot

It's just syntax, it compiles to SQL and runs on today's DBMS. It has no difference in speed or functionality.
It does make a difference if the resulting SQL is unperformant.

Someone who is good at SQL can look at a query and see where the query planner might go wrong, then make subtle tweaks to get better performance. Optimizing queries in a compile-to-SQL language basically has to be left to the compiler, which may not have the context needed to write performant SQL.

Compiling a query is different than compiling a whole program to assembly, because the query makes assumptions about schemas and indexes that cannot be encapsulated in what you're feeding to the compiler at that moment.

Agreed.

Also, someone who ISN'T good at SQL can look at EXPLAIN output and see where the query planner HAS gone wrong.

Adding PRQL to the mix unambiguously makes that analysis and optimization step harder.

So I agree that unperformant SQL is unperformant, but PRQL can _reduce_ the chance of making mistakes there.

Here's an example [1] of someone reporting that a query engine was far more performant with one SQL construction, and then PRQL changing the SQL we output to use that construction.

GCC & Clang are much better at compiling to assembly than any person! PRQL isn't there yet, but each improvement scales to everyone who uses it.

[1]: https://github.com/PRQL/prql/issues/2182

[Disclaimer: PRQL dev]

Be honest. C to object code to linker to final binary are a MUCH bigger leap than PRQL to SQL.

You make it sound like SQL is some insurmountable hurdle while PRQL is a bunny slope. You're not getting anywhere with that nonsense.

[Disclaimer: not a PRQL dev]

I don't know where this aggression is coming from — we're an open-source project, only volunteers, trying to do something that makes it easier for folks to work with data.

Critical & constructive feedback would be really appreciated, but that's not this — why all the snark?

(I couldn't help but notice you didn't comment on the difference in formatting in the examples.)

Do you have examples of PRQL working with jsonpath? Generating JSON? Unnesting arrays? Returning ids from an INSERT or UPDATE without making a separate read query?

Not trying to be argumentative. Honest question.

I am not affiliated in any way with the PRQL project. Those are great questions though, I hope we get an answer.
@remram I replied to the parent comment.
Thanks, that's a great question. You're right in that so far we haven't highlighted working with JSON and I hadn't actually tried until this point. IMHO the true power of PRQL comes from the fact that it allows you to define functions and with that you get the power of composability which is the true power of almost every programming language (and which is for the most part completely lacking in SQL).

So with that said, I tried the following POC (remember that PRQL is just a SQL generator so the JSON capabilities depend on your underlying RDBMS):

    ```sh
    > prqlc compile <<EOF
    let get = path obj -> s"""{obj} -> {path}"""
    let getstr = path obj -> s"""{obj} ->> {path}"""
    let extract = obj path -> s"""json_extract({obj}, {path})"""
    
    from [{data='{"duck": [1, 2, 3]}'}]
    select { data | get '$.duck[0]', data | getstr '$.duck[1]', extract data '$.duck[2]' }
    EOF
    WITH table_0 AS (
      SELECT
        '{"duck": [1, 2, 3]}' AS data
    )
    SELECT
      data -> '$.duck[0]',
      data ->> '$.duck[1]',
      json_extract(data, '$.duck[2]')
    FROM
      table_0
    
    -- Generated by PRQL compiler version:0.9.4 (https://prql-lang.org)
    ```
What's going on here is that I used [s-strings](https://prql-lang.org/book/reference/syntax/s-strings.html) to define custom PRQL functions `get`, `getstr` and `extract` which translate into the underlying `->`, `->>` and `json_extract` SQL constructs.

You could then for example pipe that query to DuckDB (the example is taken from the following DuckDB blogpost [Shredding Deeply Nested JSON, One Vector at a Time](https://duckdb.org/2023/03/03/json.html)):

    ```sh
    > prqlc compile <<EOF - | duckdb
    let get = path obj -> s"""{obj} -> {path}"""
    let getstr = path obj -> s"""{obj} ->> {path}"""
    let extract = obj path -> s"""json_extract({obj}, {path})"""
    
    from [{data='{"duck": [1, 2, 3]}'}]
    select { data | get '$.duck[0]', data | getstr '$.duck[1]', extract data '$.duck[2]'}
    EOF
    ┌───────────────────────┬──────────────────────────┬───────────────────────────────────┐
    │ "data" -> '$.duck[0]' │ ("data" ->> '$.duck[1]') │ json_extract("data", '$.duck[2]') │
    │         json          │         varchar          │               json                │
    ├───────────────────────┼──────────────────────────┼───────────────────────────────────┤
    │ 1                     │ 2                        │ 3                                 │
    └───────────────────────┴──────────────────────────┴───────────────────────────────────┘
    ```
HTH
Excellent, thank you for the response.
For those interested who want to learn more, we have a number of presentations coming up at conferences on three continents:

- [QCon SF, October 2nd, San Francisco, USA: ](https://qconsf.com/presentation/oct2023/prql-simple-powerful...)

- [PyconZA, October 5th, Durban, South Africa: ](https://za.pycon.org/)

- [Community over Code (ApacheCon), October 9th, Halifax, Canada: ](https://communityovercode.org/schedule-list/#FT005)

- [data2day, October 12th, Karlsruhe, Germany: ](https://www.data2day.de/veranstaltung-21353-0-prql-a-modern-...)

SQL will never die for the same reason that JavaScript will never die: because it's built in to all major database engines.

In both cases, any other language will be starting as a second class citizen that has to compile to SQL/JS. During this phase of a new language's lifetime, it is either a surface-level syntactic change (a la Coffeescript) that provides no objective improvement, or it has to compile its simple semantic structures into opaque SQL/JS structures that will be off the beaten path and therefore not highly optimized by the runtime. Neither will reach sufficient adoption to become a first-class citizen in a major existing platform.

TypeScript succeeded where others failed because it provided much-needed static analysis while keeping the changes minimal enough that it's completely obvious what the runtime code will look like, so there are no unexpected performance gotchas. SQL, on the other hand, doesn't really need a TypeScript because SQL is highly statically analyzable by nature.

It's not that I don't believe we could do with an improvement on SQL, but I really don't see a realistic path forward for a replacement.

IMO a semantic layer is a nice UX/DX improvement over plain SQL in a business/analytics setting. I use a semantic layer* for >95% of use cases and fall back to SQL when needed. This balance will be different for each business of course.

* https://github.com/totalhack/zillion

While I mostly agree, there is a bit of Stockholms syndrome.

A lot of people don't know what they even could be missing.

For example, there is no succinct way of writing an antijoin in SQL .

The MERGE command has only been implemented by some engines due to (IIRC) concurrency concerns/ambiguities.

ANSI SQL JSON operations have improved but are still clunky.

Boolean NULL and IN is a clusterf of footguns.

Etc.

Oh, I agree! SQL is far from optimal, as is JS. My thoughts are more about the feasibility of a replacement than the need for one.

US electrical outlets are also highly flawed, but we're never going to replace them either. Standards, once entrenched, are nearly impossible to uproot.

I like US electrical outlets. They make for small, foldable plugs.

Yes, it's easier to shock one's self than with most other outlet designs, but the consequences of that are usually mild and help to instill a healthy respect for electricity.

[PRQL dev here]

I agree with the sentiments, even if not the conclusion. SQL is omnipresent and is "fine" in a lot of cases.

TypeScript is indeed a great example of the case; Kotlin too. I'd also add that databases are already adding PRQL support — ClickHouse has native support, there's a DuckDB extension, and folks are working on a Postgres extension.

One thing I'll respectfully disagree with — "SQL is highly statically analyzable by nature":

As a really basic example: `SELECT <expr> FROM tbl` — can we tell what shape the result is? In SQL, shapes / types require a lot of context — the result could be a single row in the case of `SUM(foo)`, or it could be every row in the case of `foo, bar`. More in https://prql-lang.org/faq/...

> As a really basic example: `SELECT <expr> FROM tbl` — can we tell what shape the result is?

Like most statically analyzable code, you have problems if you try to statically analyze part of the code without the relevant definitions.

But, yes, if you have the relevant code (e.g., the DDL for the table), you can. (Without it, you can in the sense that you can statically determine it as a function of the table definition, which can be sufficient in some cases.)

> in SQL, shapes / types require a lot of context — the result could be a single row in the case of `SUM(foo)`, or it could be every row in the case of `foo, bar`.

Oh, you mean, can we statically determine the shape of an expression’s results without knowing the expression? Well, no, and that’s true in most statically-analyzable languages.

Can you tell the shape of the result of `<expr>()` in Java by just reading it? No. Does that mean that Java can't be statically analyzed? Of course not!

A static analysis system is not restricted to weird abstract constructs like `select <expr>`, it sees the complete picture and can come to conclusions based on the concrete code construct it's given. There's absolutely nothing stopping a SQL static analysis from recognizing that `SELECT sum(foo)` will always return one row with one column that is an integer type, while `SELECT foo, bar` returns some number of rows that have foo and bar columns whose types can be inferred from the CREATE TABLE statements.

Again — we agree — with omniscience, it's possible to statically analyze everything. My claim is some languages can be statically analyzed with much less context than others.

Check out "Expression substitution" at https://www.scattered-thoughts.net/writing/against-sql/ if you're interested in more here.

Having some static constraints around <expr> makes it a lot easier to analyze though. Both for humans and compilers.

Otherwise you end up with situations like C++ templates, that can expand to anything. With long confusing errors, unable to distinguish if it’s an error in the usage, the definition or the call-site.

> One thing I'll respectfully disagree with — "SQL is highly statically analyzable by nature"

Are you suggesting that PRQL is capable of this? Or at least easier to do in PRQL?

Yes, much easier.

Check out "What’s this aggregate function?" at https://prql-lang.org/faq/. Without much context, we can understand the shape of a result.

And because queries can be longer without becoming unreadable, the lineage information is richer.

If you know <expr> then yes you know the shape of the data. You may not know the number of rows, but I fail to see how that makes it not analyzable.
Quoting from https://www.scattered-thoughts.net/writing/against-sql/:

---

There are many cases where a small change to a computation requires totally changing the structure of the query, but subqueries are my favourite because they're the most obvious way to express many queries and yet also provide so many cliffs to fall off.

    -- for each manager, find their employee with the highest salary
    > select
    >   manager.name,
    >   (select employee.name
    >    from employee
    >    where employee.manager = manager.name
    >    order by employee.salary desc
    >    limit 1)
    > from manager;
     name  | name
    -------+------
     alice | bob
---

If the inner query has more than one row, the query will raise an error. That's difficult to know from the SQL alone.

Difficult for the author, but not for static analysis. A static analysis system can pretty easily enforce that `limit 1` be set when a query is used in that position.
Well, Malloy is developed within Google by the founder of Looker, so there is a chance it could be natively integrated into BigQuery. At that point you have a next gen SQL replacement available on one of the most widely used analytics and transformation engines.
It's hard to say, if microsoft or google where behind prql and promoting it, it may as well become a typescript. There's a reason many developers use query builders, while not exactly the same, they want programming language features and familiarity of modern programming languages. Not saying prql is the correct approach here since I don't know it.
Pretty cool, your description got my click. I particularly enjoy that a filter is a filter before and after grouping.

One thing, the "showcase" section is not usable for me on mobile. The code box does not fit on the screen horizontally and I can't scroll right to see the remainder of it.

Thank you for the feedback. I'll let the team know.

We definitely want people on all devices to be able to learn about the project.

I've been excited in the abstract about PRQL for quite a while. But something FQL seems to have a much better handle on is the value of document-orientedness, or what you might alternatively call "gradual schematization".

This problem has been solved (if not beautifully, at least acceptably) by modern SQL databases that support a JSON storage format and associated "secondary query language".

I know PRQL has had an open issue on this subject for a while. I just want to note that I think this is truly one of the critical "missing pieces" to PRQL, without which it may never be able to break out into common usage.

FQL is interesting because it focuses on transactional systems and eliminating the need for an ORM in applications. I feel many of the SQL replacement projects like PRQL and Malloy instead come from the analytics side of the house, which doesn't really help application developers at all. (But does raise the question, how do I do analytics in Fauna? Do I ETL to a traditional warehouse system?)
That's a good question. If you need to ETL to a traditional data warehouse in order to do analytics, wouldn't you need an ORM, which is exactly what they're trying to avoid? (Also note that it's usually ELT instead of ETL these days, like if you're using Snowflake)

Or is Fauna a hybrid transactional/analytical database so you can do analytics in it using FQL? (Maybe the long-awaited possibility of true hybrid databases is real?!) But then you'd need to train all of your business intelligence analysts on FQL, which would be a drawback.

Great project, wish you all the best. Anything to try and unseat SQL from common use (we can all wish for the day we run PostgrespostSQL in production). At the moment the project is probably going to lose people because it isn't obvious how to get started - many SQL beginners don't know what a compiler is and will get confused by the docs.

For the sake of their sanity, it'd be worth considering putting an example of using the compiler on a local text file somewhere prominent on that site. That way beginners can go in, write some PSQL, compile it and use it against real SQL databases.

Or if not the compiler, make it clear how beginners are supposed to engage with this. There is a big need out there for something dplyr-like that works. There are a dizzying array of options and that isn't going to help some good people who need a bit of handholding.

Thank you for your feedback. That's really valuable!

We have the [PRQL Playground](https://prql-lang.org/playground/) exactly for that purpose.

We'll try and make it more prominent on the front page. I've also felt that we should have a "Getting started" page and will push that as a priority.

Ah, I see. Maybe you're just losing beginners who NoScript then. I hadn't figured out what the Playground was supposed to do but it turned out there was an extra disabled script.

Nevermind, thanks.

Is there any intention of eventually supporting DML or DDL statements? That's when the COBOL-like nature of SQL syntax is most frustrating. For example, in order to run "ALTER COLUMN ..." I have to parse a ridiculous BNF like this[0] almost every time. I'll never remember it.

Usually, the error is a gotcha built into the language syntax (e.g. forgot the keyword "TO").

[0] https://www.postgresql.org/docs/13/sql-altertable.html

SQL is based on relational calculus rather than relational algebra, which is why it's declarative. Relational algebra is built on fundamental relational operators (select, project, filter, product etc) which are imperatively applied. You can find out more about it here https://techdifferences.com/difference-between-relational-al...
At least superficially this looks a lot like C# LINQ to me in terms of structure and database independence (as for EF Core + LINQ). It’s in my top 3 features of that language.

https://www.tutorialsteacher.com/linq/sample-linq-queries

Edit: Shortened to link due to formatting issues

LINQ is definitely one of the big influences along with many other great projects that form the prior art in this space.

See this section in our FAQ: https://prql-lang.org/faq/#:~:text=Something%20here%20remind...

> Iterating through that would use lazy evaluation by default, returning row by row from the db as needed.

That's... not an advantage in most cases

Well, it’s an option when and IIRC a default. Here’s a discussion https://stackoverflow.com/questions/31366236/lazy-loading-vs...
Also reminds me of the Power Query M language - semantically anyway
Yep, it reads very easily like C# LINQ method & query syntax.
That first PRQL code sample is wonderfully readable.
It is! One suggestion to make it even more convincing: I'd love to see the SQL statement it compiles to.
This +1. I've been burned before when using ORMs which translate simple-looking queries to terribly inneficient SQL statements underwater.
Yeah I’m a web dev, and recently I found out the most popular JS ORM doesn’t produce joins. It’ll just execute multiple queries in sequence. I don’t know how common that is in the ORM landscape but for me that’s a deal breaker.
That’s just a direct result of lazy loading — if you don’t grab the related objects in the initial query, then there’s no joins to be had.

Most ORMs do lazy loading by default, but also have a way of doing eager loading — either requiring the nested object to always be loaded as well, or dropping down to some pseudo-sql.

In c#/EFCore, I always prefer to avoid lazy loading and just write LINQ, and just use the ORM to map the resultset back to objects

Common enough to have a name: the N+1 query problem.
1+N is clearer I think, and what I remember seeing in the past - it matches what's actually happening with this problem. First time I saw N+1 sometime around a year ago, I had no idea it referred to the same thing and thought it was something different.

N+1 looks like an issue with aggregation after a parallel run, something I've encountered with celery tasks before.

For reference, this is the output:

    WITH table_1 AS (
      SELECT
        customer_id,
        total,
        total - 0.8 AS _expr_0
      FROM
        invoices
      WHERE
        invoice_date >= DATE '1970-01-16'
    ),
    table_0 AS (
      SELECT
        COALESCE(SUM(_expr_0), 0) AS sum_income,
        customer_id
      FROM
        table_1
      WHERE
        _expr_0 > 1
      GROUP BY
        customer_id
      ORDER BY
        sum_income DESC
      LIMIT
        10
    )
    SELECT
      c.customer_id,
      CONCAT(c.last_name, ', ', c.first_name) AS name,
      table_0.sum_income,
      version() AS db_version
    FROM
      table_0
      JOIN customers AS c ON table_0.customer_id = c.customer_id
    ORDER BY
      table_0.sum_income DESC
Awesome! That's what we're hoping for. Great to hear that you find it wonderfully readable!
> we believe that SQL is a combination of two things:

> 1. Relational Algebra, which is eternal because it's just maths, and 2. A language designed in the 70s that looks like COBOL.

Your belief is as real as my belief that it rains too much in London ;) (that is, it is correct)

But why people have such hold on to such a quirky syntax beats me

How would you compare prql with dbt?
dbt integration was one of our major goals early on but we found that the interaction wasn't as straightforward as we had hoped.

There is an open PR in the dbt repo: https://github.com/dbt-labs/dbt-core/pull/5982#issuecomment-...

I have some ideas about future directions in this space where I believe PRQL could really shine. I will only be able to write those down in a couple of hours. I think this could be a really exciting direction for the project to grow into if anyone would like to collaborate and contribute!

dbt is just an orchestration tool. It uses SQL because that's what you need to pass to the target database. There is a python plugin if you prefer to use that for your models instead. Theoretically dbt could wrap any language your target system accepts. The actual configuration of the dbt runtime itself is done with yaml files.
That looks awesome. Does it support directly querying against databases (PostgreSQL, SQL Server, ...)? ie. is there a "Run" command in vscode that takes care of compiling & running the compiled sql?
This is tremendous. I'm curious to know if a CLI `prqlite3` exists which wraps around the `sqlite3` CLI many of us know and love.
Thank you.

The CLI usability was one of the aims behind [prql-query (pq)](https://github.com/prql/prql-query/). sqlite integration was on the roadmap but unfortunately that project has been largely unmaintained by me for the past 6 months. (This is just referring to prql-query and not PRQL which is under very active development.)

I'm working on a new project which will do exactly this (and a lot more!) which I hope to release next week. I'll drop the link here when that's ready.

How is the language server support?
We do have grammars for many editors. I'm not sure about LSP off the top of my head but you should find something in the docs under "Integrations".

There is also a VSCode extension: https://marketplace.visualstudio.com/items?itemName=prql-lan...

[PRQL dev here]

We don't have LSP support yet, but it's on the Roadmap. We've designed the language to be very LSP-friendly — one of the benefits of starting with `from` and pipelining each function.

In that first example, is the last line superfluous? It doesn't seem to be used.
hey, is compile time verification of queries supported for PRQL in Rust?
Not yet, but looking at what sqlx does, I think we should be able to do something similar.

It's been a small team of core contributors so far but in the last three months we've seen more people making their first PR and then going on to contribute more over time so the momentum is growing.

We'd definitely be open to contributions in this space.