Hacker News new | ask | show | jobs
by wvenable 746 days ago
I kind of disagree; the problem with SQL is that fundamentally it's actually pretty good. So alternatives either tend to be too radical (throwing out the baby with the bathwater) or simply not enough of an improvement to gain any momentum.

I feel like rational database querying is effectively solved and there's little point in re-litigating it. But still I'd be happy to switch to the perfect replacement if someone develops it.

2 comments

What does "good" mean in this context? SQL is not modular, most features are highly context-dependent and there numerous handguns.

Sql might be ok for trivial things, as in OLTP that programmers tend to work with.

But anything even slightly more advanced is... not nice.

And the standard is unique in its uselessness.

The underlying relational algebra model is brilliant thought.

> SQL is not modular, most features are highly context-dependent...

Examples?

> SQL might be OK for trivial things, as in OLTP...

What is the threshold for triviality? I've seen understandable fairly complex queries, but they're not mind-twisters by any means; if you know what you need, and understand your data, >>and are not a layperson regarding databases<< it's doable without much sweat.

> But anything even slightly more advanced is ... not nice

Again, what is the threshold, or at least what is your threshold, for triviality vs. non-trivial?

You say it's "unique in its uselessness" but "the underlying relational algebra model is brilliant", can you explain a bit further what you mean by that?

Good means it gets the job done in a fairly logical and readable way. SQL queries are not giant programs and shouldn't be. I've written some very advanced queries with plenty of common table expressions, subselects, etc. Could it be more modular? Sure. Could the syntax be better? Yes. But would that radically change how queries are written? Not really.

The worst SQL I've ever seen is when someone attempts to program it imperatively. It takes a different mentality to write SQL then to write imperative code.

> But would that radically change how queries are written? Not really.

Rust didn't radically change how applications are written as compared to C, but that didn't stop us. Nor should it. Any improvement is worthwhile. It doesn't need to be radical.

But, like another commenter points out, SQL is like Javascript. Both having ecosystems so horrendously conceived that they have ensured there is no good path to replacing/augmenting them.

Rust hasn't swept the world yet and it helps prevent real bugs and security issues. A better query language may make it fractionally easier to write database queries but you have to toss out a half-century of experience. It's not worth it for marginal gains. This isn't even opinion, this is the reason it has never happened.

I think I agree that SQL is like JavaScript. If JavaScript wasn't as expressively powerful as it is, it would have been replaced a long time ago. But it's actually good enough, despite it's quirks, that there doesn't exist a language better enough to make it worth replacement. It's possible such a language might never exist. And both SQL and JavaScript continue to improve sometimes directly stealing ideas from potential competitors.

> If JavaScript wasn't as expressively powerful as it is, it would have been replaced a long time ago.

No, it wouldn't matter how terrible or in-expressive the language is - once it got rolled out to the web browser, then it would never be replaced. That is why you have WASM nowadays - you can add new stuff but never replace stuff.

I disagree. If JavaScript was objectively limited, one of the browser makers would have just added something else and if it was good enough it would spread to others. Just like other browser technologies (for example, xmlHttpRequest). In a way, that's what happens with JavaScript right now. It continues to evolve.

Browsers did support multiple programming languages with the language attribute on the script tag. This is how Microsoft added VBScript to IE.

> That is why you have WASM nowadays

As an aside, I always figured that the "WASM" of databases would come some day. With SQLite recently publishing details about its bytecode engine, perhaps that is looking more realistic?

> Rust hasn't swept the world yet and it helps prevent real bugs and security issues.

As the inventor of the relational model has written about extensively, and as you have no doubt came to realize yourself if you've used SQL for more than a few minutes, a different query language could have prevented a whole lot of real bugs too. SQL also has its fair share of security problems that are only prevented by telling developers to be careful.

Rust will never sweep the world, of course, because there is no reason to choose a single language in the application space. Something sweeping the world tells that you royally screwed up the execution environment. But it is a viable contender, despite being no different than C in any meaningful way (clearly you don't see bug/security issue prevention as being meaningful).

> despite it's quirks, that there doesn't exist a language better enough to make it worth replacement.

As you know, Postgres went in the opposite direction, eventually switching to SQL. A DMBS – one which is probably the second most popular DMBS in existence at that – completely upending what query language it supports is not without precedent. What do you think it was about SQL that made it substantially better to justify the change?

You seem to have my point backwards. It's not that SQL is substantially better than QUEL, it's that QUEL was not substantially better than SQL. Ultimately, what made Postgres interesting wasn't the query language.

> a different query language could have prevented a whole lot of real bugs too. SQL also has its fair share of security problems that are only prevented by telling developers to be careful.

I can't think of an example where one of these newer query languages actually solve bugs in the way that Rust does with the borrow checker. Slightly better syntax may prevent bugs but no where near in the same way. I don't find SQL particularly bug-inducing -- it's mostly just annoying.

SQL isn't the nicest language to write, but I do find it one of the most readable languages out there.

> The worst SQL I've ever seen is when someone attempts to program it imperatively. It takes a different mentality to write SQL then to write imperative code.

Use to any me when my team lead who was a fairly shitty programmer used to describe queries to me in an imperative way. "IF, blah blah blah, THEN". You need to think of your queries as "Return this to me WHEN". Once you get your head around this, there is something quite elegant about the relational model.

All of this is true.

"Being ok for trivial things... that programmers tend to work" with is precisely why SQL has won over the years. SQL has never been a great general-purpose language, but is fantastic at the things programmers tend to work with. That SQL (and associated databases) keeps getting extended to meet new data storage/retrieval problems has really helped SQL stick with us.

Honestly for OLAP use cases, I think dataframe apis are superior.
I gained a a few years of experience in SQL-based OLAP systems at my current job. In this time I developed a strong appreciation for SQL, especially for its composability. Recently, I started a project in Google Colab, gluing together queries from several systems with Pandas DataFrames. I can honestly say that I've never been more frustrated learning an API than I have with Pandas.

Need some window function like LAG() or LEAD()? Too bad, I hope you like writing Python "for i in range(...):" loops. My notebook is littered with ".reset_index()" calls, ".replace(np.nan, None)", "axis='columns'", "foo.assign(bar=lambda df: df.apply(lambda row: ...))". groupby is especially confusing to me, as a Pandas GroupBy is difficult to compose with a normal DataFrame until you call .reset_index(). Compare this to SQL, where a subquery is a subquery, whether or not it has a GROUP BY clause.

The Pandas documentation also leaves a lot to be desired. Take the documentation of pandas.NaT[1] for example. "pandas.NaT: alias of NaT". Ok? That still doesn't tell me what NaT is, nor does it link to the thing that it aliases. The groupby documentation[2] also caused me some headaches, as it covers only the simplest aggregation use-cases.

Pandas is clearly better for some use-cases, but mostly for simple operations that are well-supported by the API (perhaps numeric operations that are implemented with native numpy routines). But if I'm doing some interactive OLAP stuff, I'll reach for SQL. Perhaps the problem is I'm trying to use Pandas like it's SQL, when it's not. But for manipulating data, I'd rather use a language than a library.

[1] https://pandas.pydata.org/docs/reference/api/pandas.NaT.html [2] https://pandas.pydata.org/docs/user_guide/groupby.html

edit: half a sentence

I should've specified polars, ibis, dplyr, pyspark, etc.

Pandas is easily the worst dataframe api.

I'll never go back to SQL from polars, it's far superior in both composability and readability imo.

Not to mention complex transforms can be version controlled and unit tested, and then you can compose these together.

It also maps to/from SQL quite naturally.

I think the fundamental problem is that if you want to talk to databases, you have to speak SQL. Like Javascript on the web, you are stuck with what the platform provides. Both languages have significant deficiencies, yet is/was the only game in town.

Sure, there are some languages which can compile down to SQL, but like Typescript, every once and a while, you find some edge case where the transpilation fails you and you might as well be an expert in SQL.

I love all of the ideas of PRQL, but I do not know if I would be painting myself into a corner adopting something that will go the way of CoffeeScript.