Hacker News new | ask | show | jobs
by tkot 1391 days ago
It's true though. Relational algebra is a very nice abstraction but SQL as a language is subpar to say the least. SQL is really unergonomic, it's difficult to reuse SQL code which is why there are no SQL libraries worth talking about and it's the reason why people go out of their way to create tools to replace SQL like ORMs or LINQ.

SQL is basically a DSL for data processing just like XSLT for example. Would you call XSLT "designed for large scale programming"?

The "they will not be very productive with it" comment is a bit of a stretch - it can be very productive in the domain it is designed for - but you can often replace SQL at its job (like for example use Dataframe API in Spark instead of SQL API) while there are many tasks that SQL is a very poor fit for.

6 comments

As much as SQL could be better, it's a very strong local optimum. There is no credible replacement whatsoever for what it does.

It's true people try to replace it, but most solutions are of rather questionable quality. I personally argue against ORMs in pretty much any situation.

> SQL is basically a DSL for data processing

Yeah and what's wrong with that? SQL databases power everything, including data sources with billions of records that serve millions of customers, with hundreds of developers/DBAs working on them. Does that not count as programming in the large?

> There is no credible replacement whatsoever for what it does.

I'm compelled to agree with this under protest. There was a language called Dataphor based on D and the Third Manifesto, it's unfortunately hard to find even sample code any longer, but it would be a strict improvement on SQL.

Did I mention you can't even find a corpus? Good luck running any implementation on a modern system.

I'm reasonably content writing SQL, but I know a syntax with the same power but lacking several disadvantages is possible, and I'd rather use a mature implementation of that instead, if I were able.

The worst part of SQL is indeed how hard it is to compose. Something like D, where you would keep the relational model but add user-defined types would indeed be a strict improvement.

The reason why I'm skeptical of most proposed alternatives is that I'm fundamentally in your exact same position: I know SQL isn't perfect but I'm reasonably content with it; and they invariably all end up throwing away the "good parts" of SQL (relational model, declarative, easy stuff is easy).

The first step of an hypothetical solution that replaces SQL isn't "SQL sucks", it's "SQL is extremely good at what it does but has problems that are only fixable with a new language".

The Dataphor stuff was all open sourced I believe. But it's in a C#/MS ecosystem, and tied into their overall application builder tool etc. Not really a standlone product.

There's other "D" type projects. Rel is one. Don't have URL handy but it's fairly easy to find. They publish their grammar and example code.

None of these tools have ever matured or become popular. I have theories why, we could discuss forever.

I think the issue has more to do with two things: a) most people don't understand the relational model fully, so they have no idea what they're missing b) new databases (and existing) simply cannot afford to rock the boat here because they need customers. And in terms of the architecture of the DB system, the SQL parser is one of the lower effort items (when compared to query planner, optimization, storage impl and storage optimization, replication, etc.) So why invest there for little value? Customers aren't asking for it.

The company I'm contracting for right now has some interest in working in this space.

Date & Darwin did good work with "The Third Manifesto" but it went almost completely ignored, and the tone and target of it may have been off. Frustratingly we went through a phase where SQL went out of style and then back into style ("NewSQL") and so there may have been a window missed there where alternative query languages (but still based on the relational model) could have risen. But instead "NoSQL" was too interested in jettisoning the relational model along with SQL (mostly I would argue because they don't understand it).

There has been some recent rise in Datalog implementations in the Clojure community. That is interesting, though not strictly as an alternative to SQL.

>As much as SQL could be better, it's a very strong local optimum

I mostly agree, though I think that SQL's popularity comes mostly from network effects.

> Yeah and what's wrong with that?

Nothing is wrong wrong with that. I just think that it's reasonable to interpret the words "programming in the large" as "being usable as general purpose programming language" (which I admit is not a very strict definition) whereas SQL (or XSLT or awk or bash...) seem well suited for certain niches (important niches nonetheless).

Prolog is a better RM language than SQL. (Codd had a language called Alpha but that's about all I know about it.) It's not a credible replacement for all use cases, but it's pretty capable.
>SQL as a language is subpar to say the least. SQL is really unergonomic, it's difficult to reuse SQL code

On the DML side I've found SQL to be very reusable at any company I've worked for - grep/search repos for the tables of interest, throw the queries in a CTE and you're off to the races. If the data infrastructure is robust you can probably just query from ([un]materialized) views - quite literally SQL code reuse. And even across vastly different domains, even if not directly resuable, SQL queries are still highly transferable. I can see it being more true on the DDL side but even there, at least anecdotally most of the DBAs/devops/infrastructure engineers etc I've met seemed to have favorable impressions of SQL when needed in between the tooling.

>while there are many tasks that SQL is a very poor fit for.

You probably don't want to try and perform graphical rendering tasks, and it's true certain use cases like dynamically generating SQL statements from table/column names _and then executing_ them requires some manual input. But which data/processing related tasks is SQL a very poor fit for?

In my experience it's much more common to actually see the opposite - inefficient usage of ORMs, exporting datasets only to then perform pre/postprocessing, often requiring building programming "jigs" to circumvent bottlenecks, etc when it could've been done in a more streamlined manner in the DB/warehouse.

I agree regarding SQL views - it's probably the most frequently used way of reusing SQL code.

> And even across vastly different domains, even if not directly reusable, SQL queries are still highly transferable.

That's the point, they are probably transferable in the "copy, paste and tweak" sense which would be highly frowned upon in other popular languages.

> But which data/processing related tasks is SQL a very poor fit for?

I specifically called SQL "a DSL for data processing", I probably should have written "there are many domains that SQL is a very poor fit for" to be more clear.

>In my experience it's much more common to actually see the opposite - inefficient usage of ORMs, exporting datasets only to then perform pre/postprocessing, often requiring building programming "jigs" to circumvent bottlenecks, etc when it could've been done in a more streamlined manner in the DB/warehouse.

Totally agree. Lukas Eder has some nice presentations about it.

Got it, I see what you mean, all fair points.

>Totally agree. Lukas Eder has some nice presentations about it.

Thanks - I googled him and instantly recognized the JOOQ blog, quality stuff.

>Totally agree. Lukas Eder has some nice presentations about it.

Thanks for the shout out! For the record, that's probably the referenced talk: https://www.youtube.com/watch?v=wTPGW1PNy_Y

It's nonsense. First off there is no such thing as "large scale programing". Maybe they mean big data? SQL is just a description of what data you want, it has nothing to do with how that query is implemented or optimized. Unless you think Spark or Bigquery aren't appropriate for large data, in which case I'd like to see how removing SQL helps.
Reusing SQL might be a smell?

I have seen people go crazy in stored procedures or via ORMs with super complex queries.

Avoid complex queries (with more foresight about how to structure, cache and query data) and you might not need the SQL code reuse.

There might be some less than ideal bits you would live to DRY up and would if it were OO or Functional Programming but perhaps let it be in SQL.

SQL is performance programming anyway so you are allowed!

> there are no SQL libraries worth talking about

I assume you mean libraries of SQL code (e.g. query fragments/templates) as opposed to libraries for working with SQL?

Libraries written in SQL. I'm sure one can search for let's say a snippet that will create a date dimension table for a data warehouse and copy it but it's pretty clunky compared to even Python modules.

I'm criticizing SQL as a language which is lacking in composability (as opposed to criticizing relational algebra or relational data model).

Libraries for working with SQL (full-fledged ORMs or simpler query builders) are themselves written in a different language so they don't necessarily prove anything about SQL itself, though one could argue that if people want to use them then they might not be satisfied with raw SQL.

A library written in SQL is called... a view. If you are repeating yourself in SQL to the extent you're reaching for a library, you might be doing it wrong. Take advantage of your DB's facilities instead of fighting it.

Or alternatively, if it's for things that really are mutating state and involving biz logic... a stored procedure.

Date & Darwin proposed the addition of "operators" to the relational algebra in their "Tutorial D" description of alternatives to SQL. That is, instead of "stored procedure", the addition of a type system matching on relations ("tables") and their tuples ("rows") and then the ability to create user-defined programmatic "operators" (a bit like OO methods) for those types.

I think the point was that it's not really easy to broadly share useful libraries of views or stored procedures. At one point Oracle was working on an integration that would allow you to package stored procedures written in JavaScript as node modules and then just install them to any database. Not sure what came of that.
> it's difficult to reuse SQL code which is why there are no SQL libraries worth talking about

I feel that it isn't SQL that is actually difficult to re-use. It's what SQL was designed to describe that is difficult to re-use: business entities.