Hacker News new | ask | show | jobs
by maximilianroos 1003 days ago
[PRQL dev here]

I agree with the sentiments, even if not the conclusion. SQL is omnipresent and is "fine" in a lot of cases.

TypeScript is indeed a great example of the case; Kotlin too. I'd also add that databases are already adding PRQL support — ClickHouse has native support, there's a DuckDB extension, and folks are working on a Postgres extension.

One thing I'll respectfully disagree with — "SQL is highly statically analyzable by nature":

As a really basic example: `SELECT <expr> FROM tbl` — can we tell what shape the result is? In SQL, shapes / types require a lot of context — the result could be a single row in the case of `SUM(foo)`, or it could be every row in the case of `foo, bar`. More in https://prql-lang.org/faq/...

4 comments

> As a really basic example: `SELECT <expr> FROM tbl` — can we tell what shape the result is?

Like most statically analyzable code, you have problems if you try to statically analyze part of the code without the relevant definitions.

But, yes, if you have the relevant code (e.g., the DDL for the table), you can. (Without it, you can in the sense that you can statically determine it as a function of the table definition, which can be sufficient in some cases.)

> in SQL, shapes / types require a lot of context — the result could be a single row in the case of `SUM(foo)`, or it could be every row in the case of `foo, bar`.

Oh, you mean, can we statically determine the shape of an expression’s results without knowing the expression? Well, no, and that’s true in most statically-analyzable languages.

Can you tell the shape of the result of `<expr>()` in Java by just reading it? No. Does that mean that Java can't be statically analyzed? Of course not!

A static analysis system is not restricted to weird abstract constructs like `select <expr>`, it sees the complete picture and can come to conclusions based on the concrete code construct it's given. There's absolutely nothing stopping a SQL static analysis from recognizing that `SELECT sum(foo)` will always return one row with one column that is an integer type, while `SELECT foo, bar` returns some number of rows that have foo and bar columns whose types can be inferred from the CREATE TABLE statements.

Again — we agree — with omniscience, it's possible to statically analyze everything. My claim is some languages can be statically analyzed with much less context than others.

Check out "Expression substitution" at https://www.scattered-thoughts.net/writing/against-sql/ if you're interested in more here.

Having some static constraints around <expr> makes it a lot easier to analyze though. Both for humans and compilers.

Otherwise you end up with situations like C++ templates, that can expand to anything. With long confusing errors, unable to distinguish if it’s an error in the usage, the definition or the call-site.

> One thing I'll respectfully disagree with — "SQL is highly statically analyzable by nature"

Are you suggesting that PRQL is capable of this? Or at least easier to do in PRQL?

Yes, much easier.

Check out "What’s this aggregate function?" at https://prql-lang.org/faq/. Without much context, we can understand the shape of a result.

And because queries can be longer without becoming unreadable, the lineage information is richer.

If you know <expr> then yes you know the shape of the data. You may not know the number of rows, but I fail to see how that makes it not analyzable.
Quoting from https://www.scattered-thoughts.net/writing/against-sql/:

---

There are many cases where a small change to a computation requires totally changing the structure of the query, but subqueries are my favourite because they're the most obvious way to express many queries and yet also provide so many cliffs to fall off.

    -- for each manager, find their employee with the highest salary
    > select
    >   manager.name,
    >   (select employee.name
    >    from employee
    >    where employee.manager = manager.name
    >    order by employee.salary desc
    >    limit 1)
    > from manager;
     name  | name
    -------+------
     alice | bob
---

If the inner query has more than one row, the query will raise an error. That's difficult to know from the SQL alone.

Difficult for the author, but not for static analysis. A static analysis system can pretty easily enforce that `limit 1` be set when a query is used in that position.