Hacker News new | ask | show | jobs
by vkazanov 746 days ago
What does "good" mean in this context? SQL is not modular, most features are highly context-dependent and there numerous handguns.

Sql might be ok for trivial things, as in OLTP that programmers tend to work with.

But anything even slightly more advanced is... not nice.

And the standard is unique in its uselessness.

The underlying relational algebra model is brilliant thought.

4 comments

> SQL is not modular, most features are highly context-dependent...

Examples?

> SQL might be OK for trivial things, as in OLTP...

What is the threshold for triviality? I've seen understandable fairly complex queries, but they're not mind-twisters by any means; if you know what you need, and understand your data, >>and are not a layperson regarding databases<< it's doable without much sweat.

> But anything even slightly more advanced is ... not nice

Again, what is the threshold, or at least what is your threshold, for triviality vs. non-trivial?

You say it's "unique in its uselessness" but "the underlying relational algebra model is brilliant", can you explain a bit further what you mean by that?

Good means it gets the job done in a fairly logical and readable way. SQL queries are not giant programs and shouldn't be. I've written some very advanced queries with plenty of common table expressions, subselects, etc. Could it be more modular? Sure. Could the syntax be better? Yes. But would that radically change how queries are written? Not really.

The worst SQL I've ever seen is when someone attempts to program it imperatively. It takes a different mentality to write SQL then to write imperative code.

> But would that radically change how queries are written? Not really.

Rust didn't radically change how applications are written as compared to C, but that didn't stop us. Nor should it. Any improvement is worthwhile. It doesn't need to be radical.

But, like another commenter points out, SQL is like Javascript. Both having ecosystems so horrendously conceived that they have ensured there is no good path to replacing/augmenting them.

Rust hasn't swept the world yet and it helps prevent real bugs and security issues. A better query language may make it fractionally easier to write database queries but you have to toss out a half-century of experience. It's not worth it for marginal gains. This isn't even opinion, this is the reason it has never happened.

I think I agree that SQL is like JavaScript. If JavaScript wasn't as expressively powerful as it is, it would have been replaced a long time ago. But it's actually good enough, despite it's quirks, that there doesn't exist a language better enough to make it worth replacement. It's possible such a language might never exist. And both SQL and JavaScript continue to improve sometimes directly stealing ideas from potential competitors.

> If JavaScript wasn't as expressively powerful as it is, it would have been replaced a long time ago.

No, it wouldn't matter how terrible or in-expressive the language is - once it got rolled out to the web browser, then it would never be replaced. That is why you have WASM nowadays - you can add new stuff but never replace stuff.

I disagree. If JavaScript was objectively limited, one of the browser makers would have just added something else and if it was good enough it would spread to others. Just like other browser technologies (for example, xmlHttpRequest). In a way, that's what happens with JavaScript right now. It continues to evolve.

Browsers did support multiple programming languages with the language attribute on the script tag. This is how Microsoft added VBScript to IE.

>If JavaScript was objectively limited, one of the browser makers would have just added something else and if it was good enough it would spread to others.

So like Shockwave, or Flash, or Silverlight, or Adobe AIR, or java applets, or the entire ActiveX ecosystem or...

We literally only moved things to javascript after Google spent a billion dollars writing a hyperoptimized javascript engine that has to make a pact with the devil (pretty much every javascript based exploit of computers relies on the fact that it is leakily compiled JIT to machine code, if javascript required less optimization to be useful, the internet would be a less exploitable place) just so that you could read your email in a web browser in a slightly less ugly way.

It also required several doublings in normal person computing power to be usable, and literal armies of 20 somethings writing in a couple giant abstraction layers that had to reinvent the world to do anything useful. It is still the primary burner of average person computing power, to run a million lines of javascript to do the exact same shit we did in the 90s with a 386.

Javascript should be seen as a systemic failure. If it's supposed to be "assembly for the web" as it seems to be treated now, then it needs to be VASTLY more efficient to run.

> That is why you have WASM nowadays

As an aside, I always figured that the "WASM" of databases would come some day. With SQLite recently publishing details about its bytecode engine, perhaps that is looking more realistic?

> Rust hasn't swept the world yet and it helps prevent real bugs and security issues.

As the inventor of the relational model has written about extensively, and as you have no doubt came to realize yourself if you've used SQL for more than a few minutes, a different query language could have prevented a whole lot of real bugs too. SQL also has its fair share of security problems that are only prevented by telling developers to be careful.

Rust will never sweep the world, of course, because there is no reason to choose a single language in the application space. Something sweeping the world tells that you royally screwed up the execution environment. But it is a viable contender, despite being no different than C in any meaningful way (clearly you don't see bug/security issue prevention as being meaningful).

> despite it's quirks, that there doesn't exist a language better enough to make it worth replacement.

As you know, Postgres went in the opposite direction, eventually switching to SQL. A DMBS – one which is probably the second most popular DMBS in existence at that – completely upending what query language it supports is not without precedent. What do you think it was about SQL that made it substantially better to justify the change?

You seem to have my point backwards. It's not that SQL is substantially better than QUEL, it's that QUEL was not substantially better than SQL. Ultimately, what made Postgres interesting wasn't the query language.

> a different query language could have prevented a whole lot of real bugs too. SQL also has its fair share of security problems that are only prevented by telling developers to be careful.

I can't think of an example where one of these newer query languages actually solve bugs in the way that Rust does with the borrow checker. Slightly better syntax may prevent bugs but no where near in the same way. I don't find SQL particularly bug-inducing -- it's mostly just annoying.

> It's not that SQL is substantially better than QUEL, it's that QUEL was not substantially better than SQL.

You seem to have your own point backwards, unless you have failed to make one. The premise you gave, at least as I understand it, is that there is no reason to put in the effort in moving away from SQL because nothing else is substantially better. So, by the same token, unless SQL was substantially better than QUEL, there should have been no reason for Postgres to put in the effort to make the same transition.

Which implies that SQL was substantially better. The question was: In what way?

> I can't think of an example where one of these newer query languages actually solve bugs

Perhaps because you are getting hung up on newer? They need not even be newer. The most glaring bug-inducing "problem"[1] of SQL was already recognized and solved by the original database querying language, Alpha.

[1] Problem might not be the right framing, but I lack a better word. There is no problem if you write perfect queries every single time. But, like C, it opens opportunities for making mistakes that could have been made impossible with a different design.

SQL isn't the nicest language to write, but I do find it one of the most readable languages out there.

> The worst SQL I've ever seen is when someone attempts to program it imperatively. It takes a different mentality to write SQL then to write imperative code.

Use to any me when my team lead who was a fairly shitty programmer used to describe queries to me in an imperative way. "IF, blah blah blah, THEN". You need to think of your queries as "Return this to me WHEN". Once you get your head around this, there is something quite elegant about the relational model.

All of this is true.

"Being ok for trivial things... that programmers tend to work" with is precisely why SQL has won over the years. SQL has never been a great general-purpose language, but is fantastic at the things programmers tend to work with. That SQL (and associated databases) keeps getting extended to meet new data storage/retrieval problems has really helped SQL stick with us.

Honestly for OLAP use cases, I think dataframe apis are superior.
I gained a a few years of experience in SQL-based OLAP systems at my current job. In this time I developed a strong appreciation for SQL, especially for its composability. Recently, I started a project in Google Colab, gluing together queries from several systems with Pandas DataFrames. I can honestly say that I've never been more frustrated learning an API than I have with Pandas.

Need some window function like LAG() or LEAD()? Too bad, I hope you like writing Python "for i in range(...):" loops. My notebook is littered with ".reset_index()" calls, ".replace(np.nan, None)", "axis='columns'", "foo.assign(bar=lambda df: df.apply(lambda row: ...))". groupby is especially confusing to me, as a Pandas GroupBy is difficult to compose with a normal DataFrame until you call .reset_index(). Compare this to SQL, where a subquery is a subquery, whether or not it has a GROUP BY clause.

The Pandas documentation also leaves a lot to be desired. Take the documentation of pandas.NaT[1] for example. "pandas.NaT: alias of NaT". Ok? That still doesn't tell me what NaT is, nor does it link to the thing that it aliases. The groupby documentation[2] also caused me some headaches, as it covers only the simplest aggregation use-cases.

Pandas is clearly better for some use-cases, but mostly for simple operations that are well-supported by the API (perhaps numeric operations that are implemented with native numpy routines). But if I'm doing some interactive OLAP stuff, I'll reach for SQL. Perhaps the problem is I'm trying to use Pandas like it's SQL, when it's not. But for manipulating data, I'd rather use a language than a library.

[1] https://pandas.pydata.org/docs/reference/api/pandas.NaT.html [2] https://pandas.pydata.org/docs/user_guide/groupby.html

edit: half a sentence

I should've specified polars, ibis, dplyr, pyspark, etc.

Pandas is easily the worst dataframe api.

I'll never go back to SQL from polars, it's far superior in both composability and readability imo.

Not to mention complex transforms can be version controlled and unit tested, and then you can compose these together.

It also maps to/from SQL quite naturally.