Hacker News new | ask | show | jobs
by lolinder 1003 days ago
SQL will never die for the same reason that JavaScript will never die: because it's built in to all major database engines.

In both cases, any other language will be starting as a second class citizen that has to compile to SQL/JS. During this phase of a new language's lifetime, it is either a surface-level syntactic change (a la Coffeescript) that provides no objective improvement, or it has to compile its simple semantic structures into opaque SQL/JS structures that will be off the beaten path and therefore not highly optimized by the runtime. Neither will reach sufficient adoption to become a first-class citizen in a major existing platform.

TypeScript succeeded where others failed because it provided much-needed static analysis while keeping the changes minimal enough that it's completely obvious what the runtime code will look like, so there are no unexpected performance gotchas. SQL, on the other hand, doesn't really need a TypeScript because SQL is highly statically analyzable by nature.

It's not that I don't believe we could do with an improvement on SQL, but I really don't see a realistic path forward for a replacement.

5 comments

IMO a semantic layer is a nice UX/DX improvement over plain SQL in a business/analytics setting. I use a semantic layer* for >95% of use cases and fall back to SQL when needed. This balance will be different for each business of course.

* https://github.com/totalhack/zillion

While I mostly agree, there is a bit of Stockholms syndrome.

A lot of people don't know what they even could be missing.

For example, there is no succinct way of writing an antijoin in SQL .

The MERGE command has only been implemented by some engines due to (IIRC) concurrency concerns/ambiguities.

ANSI SQL JSON operations have improved but are still clunky.

Boolean NULL and IN is a clusterf of footguns.

Etc.

Oh, I agree! SQL is far from optimal, as is JS. My thoughts are more about the feasibility of a replacement than the need for one.

US electrical outlets are also highly flawed, but we're never going to replace them either. Standards, once entrenched, are nearly impossible to uproot.

I like US electrical outlets. They make for small, foldable plugs.

Yes, it's easier to shock one's self than with most other outlet designs, but the consequences of that are usually mild and help to instill a healthy respect for electricity.

[PRQL dev here]

I agree with the sentiments, even if not the conclusion. SQL is omnipresent and is "fine" in a lot of cases.

TypeScript is indeed a great example of the case; Kotlin too. I'd also add that databases are already adding PRQL support — ClickHouse has native support, there's a DuckDB extension, and folks are working on a Postgres extension.

One thing I'll respectfully disagree with — "SQL is highly statically analyzable by nature":

As a really basic example: `SELECT <expr> FROM tbl` — can we tell what shape the result is? In SQL, shapes / types require a lot of context — the result could be a single row in the case of `SUM(foo)`, or it could be every row in the case of `foo, bar`. More in https://prql-lang.org/faq/...

> As a really basic example: `SELECT <expr> FROM tbl` — can we tell what shape the result is?

Like most statically analyzable code, you have problems if you try to statically analyze part of the code without the relevant definitions.

But, yes, if you have the relevant code (e.g., the DDL for the table), you can. (Without it, you can in the sense that you can statically determine it as a function of the table definition, which can be sufficient in some cases.)

> in SQL, shapes / types require a lot of context — the result could be a single row in the case of `SUM(foo)`, or it could be every row in the case of `foo, bar`.

Oh, you mean, can we statically determine the shape of an expression’s results without knowing the expression? Well, no, and that’s true in most statically-analyzable languages.

Can you tell the shape of the result of `<expr>()` in Java by just reading it? No. Does that mean that Java can't be statically analyzed? Of course not!

A static analysis system is not restricted to weird abstract constructs like `select <expr>`, it sees the complete picture and can come to conclusions based on the concrete code construct it's given. There's absolutely nothing stopping a SQL static analysis from recognizing that `SELECT sum(foo)` will always return one row with one column that is an integer type, while `SELECT foo, bar` returns some number of rows that have foo and bar columns whose types can be inferred from the CREATE TABLE statements.

Again — we agree — with omniscience, it's possible to statically analyze everything. My claim is some languages can be statically analyzed with much less context than others.

Check out "Expression substitution" at https://www.scattered-thoughts.net/writing/against-sql/ if you're interested in more here.

Having some static constraints around <expr> makes it a lot easier to analyze though. Both for humans and compilers.

Otherwise you end up with situations like C++ templates, that can expand to anything. With long confusing errors, unable to distinguish if it’s an error in the usage, the definition or the call-site.

> One thing I'll respectfully disagree with — "SQL is highly statically analyzable by nature"

Are you suggesting that PRQL is capable of this? Or at least easier to do in PRQL?

Yes, much easier.

Check out "What’s this aggregate function?" at https://prql-lang.org/faq/. Without much context, we can understand the shape of a result.

And because queries can be longer without becoming unreadable, the lineage information is richer.

If you know <expr> then yes you know the shape of the data. You may not know the number of rows, but I fail to see how that makes it not analyzable.
Quoting from https://www.scattered-thoughts.net/writing/against-sql/:

---

There are many cases where a small change to a computation requires totally changing the structure of the query, but subqueries are my favourite because they're the most obvious way to express many queries and yet also provide so many cliffs to fall off.

    -- for each manager, find their employee with the highest salary
    > select
    >   manager.name,
    >   (select employee.name
    >    from employee
    >    where employee.manager = manager.name
    >    order by employee.salary desc
    >    limit 1)
    > from manager;
     name  | name
    -------+------
     alice | bob
---

If the inner query has more than one row, the query will raise an error. That's difficult to know from the SQL alone.

Difficult for the author, but not for static analysis. A static analysis system can pretty easily enforce that `limit 1` be set when a query is used in that position.
Well, Malloy is developed within Google by the founder of Looker, so there is a chance it could be natively integrated into BigQuery. At that point you have a next gen SQL replacement available on one of the most widely used analytics and transformation engines.
It's hard to say, if microsoft or google where behind prql and promoting it, it may as well become a typescript. There's a reason many developers use query builders, while not exactly the same, they want programming language features and familiarity of modern programming languages. Not saying prql is the correct approach here since I don't know it.