Hacker News new | ask | show | jobs
by d33 1213 days ago
I'd say that pretty much everything about SQL is bad. Apart from composability, the syntax is super annoying (the ordering of keywords), many databases have pretty much no error reporting if you get it wrong and the linters suck. You can't unit test it. It's not very well standardized and it's actually more than one language (DML vs DDL). Plus, it's not very readable. And the declarative nature of the language is eventually going to back-stab you when you discover yet another edge case of your query planner that flips a reasonable plan to one that's literally going to take ages.
4 comments

Few things: Regarding unit testing, it's now very possible to productively unit test a SQL database - I'm familiar with a Java package that boots up an embedded PostgreSQL, obviously Docker is now in vogue, etc, etc. It's not quite 'just run a function', but you write your code in the same style as a unit test and you're running hundreds of tests in seconds. You can use transactions to automagically clear out your state, which means performance is also excellent. Regarding the standardization of the language - no, it's not totally standard (although arguably that's a good thing, we don't want 10 implementations of exactly the same thing, they'd never evolve), but because they all work sort-of-the-same, a lot of language support ends up transferrable. You can use ActiveRecord on many backing databases, for example. Query planner problem is definitely true, know folks who've spent a lot of time hinting Oracle and scaring Postgres into the same, though personally only took a few days.

However, SQL is IMO the best of a bad bunch. Most non-SQL databases I've used have been extremely sharp cornered, whether this be via: unpredictable/unbounded worst case performance in pessimal storage cases (RocksDB, Cassandra), various parts of the code where unbounded memory/storage usage is required (Cassandra, Spark, Elasticsearch), extremely low level APIs making it easy to make unobservably (as in, you won't notice it and will eat the pain every read) slow APIs (RocksDB). I'm not saying there's nothing better, but it's probably not an all-rounder.

There are certainly things I don't like about most SQL databases. Few SQL databases are columnar, which is _the_ key to good performance in the 2010s, and native LZ4-style data compression is also not widely used. But for most engineers they solve a lot of problems for you without having to think about them.

> I'm familiar with a Java package that boots up an embedded PostgreSQL, obviously Docker is now in vogue, etc, etc.

I think you are talking about Testcontainers, although it only matches the second part of the comment (can manage different external moving parts in containers for integration tests): https://www.testcontainers.org/

I'd be interested in seeing an embedded Postgres library; I only see H2 and maybe Derby being used for tests and "getting to know the ropes" use cases, eg. a default install.

Pretty sure they are talking about https://github.com/zonkyio/embedded-postgres-binaries. It's not exactly embedded postgresql since it still runs the full postgresql in a separate process but it is close enough for most purposes.
Cool, thank you!
Let’s pretend at least half of this is true. With that, ORM pretty much adds a layer of complexity on top and hides it, opening a door for a lot more well complex behaviour. That’s leaky abstraction. As often is the case, adding on top of that person using an ORM without actually knowing _any_ SQL and/or what’s behind ORM just rolls the ball more into the mud.

So on one hand, most of the times people are doomed to re-invent an ORM.. poorly. On another, if you don’t have people knowing behind the scenes using “a proper ORM”(tm) they will roll mud around, and if you do have people knowing, most will object ORM in the beginning and start, over time, to roll their own. Overall, there must be a thin line where one should walk on, but it’s unclear exactly where since it’s highly project-specific.

You can unit test SQL by spinning up Docker containers.

You should just write SQL for the database you are using (e.g. PostgreSQL). Switching databases without changing queries is unrealistic anyways.

I agree with your other points, but I am yet to see a layer above SQL that improves on them enough to justify its additional complexity.

I know a team who used JooQ (code-generated query builder for Java) to generate SQL which worked on both Postgres and Oracle and worked well enough that they could support both in production (Oracle for top end perf, Postgres for everything else). There were certain places where they had to switch on SQL format, but over years they felt comfortable enough only unit testing on one kind of database.
Here we go…

SQL is composable through views, CTEs, temp tables, and set-returning functions. It's a set-oriented language, so its composability is going to be based on sets, not the constructs in your favorite general purpose programming language using a completely different paradigm. Just because that's unfamiliar to you doesn't deny its composability. Impedance mismatch.

Regarding syntax, the best "solution" I've seen are query builders that put the FROM first but fail to fully implement the full range of functionality available to an engine, mostly due to catering to the lowest common denominator of functionality. (Looking at you, MySQL!)

This "I hate the syntax" is generally a petty argument. No popular multiuser database allows direct access to its internals, but I don't see Cypher or Mongo's QL getting anywhere near the same level of critical scrutiny SQL gets, and it's not due to their clarity, elegance, or comprehensiveness. You simply want a functional (or OO or whatever) syntax model, and it's not gonna happen because the problem SQL is solving doesn't match up. Impedance mismatch.

Unit testing: https://pgtap.org/ https://sqitch.org/docs/manual/sqitch-verify/ https://learn.microsoft.com/en-us/sql/ssdt/creating-and-defi...

SQL linters suck, I'll grant you that one. At least the syntax and error highlighting is there for good editors like DataGrip, but folks have a hard time accepting a universal style guide for their SQL though it's sorely needed along. Most folks just cobble together stuff with haphazard newlines and spacing then complain about how no one can read each others' SQL.

Wait… you're pissed there's a different set of keywords for generation and modification of structure vs generation and modification of data? You think… they should be the same… because… you think they're related?

> Plus, it's not very readable.

Ever notice how someone in every thread on every language says, "I hate the syntax. I think it should look like <insert personal favorite language>." Just accept that your difficulty with the language is a personal preference or reflection of your familiarity with it, not some kind of objective fact. It's like claiming Chinese is less understandable than English when you were born and raised in London.

> …when you discover yet another edge case of your query planner…

That's not SQL's fault. Any query language would do this. If you had to write accessors by hand such as for DynamoDB, you'd find that the original algorithm you wrote is no longer effective because the dataset you're querying got much larger. Fix your indexes, vacuum, and reset statistics. Stop blaming your tools.

You don't like SQL? That's fine. But that hatred is also not universal. I and many other folks LOVE SQL. That isn't to say it's flawless or cannot be improved. But most folks always seem to focus on those aspects that probably don't matter as much, missing the relational forest for syntactical trees.

"Beware the old man in a young man's game. He is always much more capable than he seems."

> It's like claiming Chinese is less understandable than English when you were born and raised in London.

There is a proverbial consensus that written Chinese is very difficult to understand [0, 1]. It is a supremely challenging script. A well educated Chinese person can conceivably encounter words in a written text that they know the meaning of but they cannot understand or verbalise without consulting a reference book.

[0] https://en.wikipedia.org/wiki/Greek_to_me

[1] https://flowingdata.com/wp-content/uploads/2015/04/Greek-to-...

And folks speaking any of the variations of Chinese speech, eg. Cantonese, can still read the ideographs. That's the tradeoff.

Whereas in English, a well-educated American can (and does) conceivably encounter words in written English that they neither know the meaning of, understand, or pronounce correctly without consulting a dictionary.

Hell, English pronounces the exact same written words in wildly different ways as well as multiple words spelled differently but spoken identically to the point that a non-trivial part of the written language is merely memorized rather than learned.

But we learned English when we were young, so I guess it seems "normal" that "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo" is a grammatically correct sentence.

https://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffal...

Or how animal poops can have so many semantic meanings.

https://youtube.com/shorts/z_AGi2diHt8?feature=share

I wish SQL operated on sets but it really operates on tables that have an order and may contain duplicates.
LATERAL, VALUES, et al would object to that narrow designation.

Tables are just sets of records. Nothing more. Nothing less. SQL references tables as sets of records, but JOINs are referential subsets of tables, not complete tables themselves. And LATERALs may not even have a table underneath at all! Same with set-returning functions. generate_series(…) doesn't have any table under the covers; it's purely a dynamic set of values generated on the fly.

It's turtles all the way down.

As for duplicates, that's up to your defined constraints to sort out. You're right that SQL DML knows nothing about whether the underlying set of records contains duplicates. The DDL however can trivially define a set of records to have no duplicates based upon varying conditions/attributes.

Whether a set is ordered or not or has duplicates are not has no bearing whatsoever on whether the data is set-oriented or not.