Hacker News new | ask | show | jobs
by chasil 999 days ago
MERGE is standardized in ANSI SQL 2003, and updated in 2008:

https://en.wikipedia.org/wiki/Merge_(SQL)

No question, everyone should use standardized, cross-platform syntax if at all possible.

Maybe someday ANSI will standardize some kind of upsert syntax; until that time use upsert only if you are explicitly tying your SQL to your particular database.

3 comments

> No question, everyone should use standardized, cross-platform syntax if at all possible.

There are big questionmarks around this statement. SQL isn't an interchange format. Standard table design goes a long way because data has lots of different consumers and needs frequently to be moved around between systems, standard SQL is nothing useful to aim for. Everyone already uses DB-specific drivers and syntax because it is better than the alternative.

- The standard is not generally available. Most of us will never learn what is in it.

- SQL standardises terrible syntax. The SQL standards committee has a unique view on programming where "function_name(a, b, c);" is some sort of weird alternative syntax. They haven't gotten as far as working out variables or function composition either [0].

- Tying SQL to your specific database is the best option for performance. Writing database-independent SQL is somethign of a fools errand, it is either trivial performance-insensitive enough that you should be using a code generator or complex enough to deserve DB-specific turning.

[0] I assume. Not going to pay for a copy of the standard to check.

> The standard is not generally available. Most of us will never learn what is in it.

It also not aimed at users, but at implementors. Funny enough, they don't read the standard either ;) But more seriously: Some implementations are old and generally vendors prefer not changing the behavior of their product. When the standard comes later, the train has already departed. The most critical incompatibilities are in the elder features. The newer ones have a tendency the be more aligned with standard behavior (e.g. window functions are typically implemented just fine).

> They haven't gotten as far as working out variables or function composition either [0].

Part 2 SQL[0] is declarative and intentionally doesn't have variables. Part 4 SQL (aka "pl" SQL) does have variables. I personally consider Part 4 obsolete on systems that have sufficient modern Part 2 support.

> Tying SQL to your specific database is the best option for performance. Writing database-independent SQL is somethign of a fools errand, it is either trivial performance-insensitive enough that you should be using a code generator or complex enough to deserve DB-specific turning.

While this is certainly true for some cases there are also plenty of examples where the standard SQL is more concise and less error prone than product-specific alternatives. E.g. there COALESCE is the way to go rather than isnull, ifnull, nvl, or the like (typically limited to two arguments, sometimes strange treatment of different types).

There is a lot of *unnecessary* vendor-lock in in the field.

[0] https://modern-sql.com/standard/parts

> Part 2 SQL[0] is declarative and intentionally doesn't have variables...

The issue I have with that is that making a bad decision for a reason doesn't change the fundamental correctness of the decision.

SQL semantics clearly support variables, because they are available as WITH ... clauses and subqueries. So either these elements are a mistake and should be discouraged, or the language has variables and they should be made to read like variables in almost all other languages for consistency and readability.

This is literally a cosmetic issue but a lot of thought on syntax has happened since the 1970s and there seems to be an overwhelming on-the-ground consensus that SQL's syntax is not the way to lay code out. The semantics? Great. But there is nearly nothing from SQL's syntax choices that is present in any surviving programming language since. Many of the more popular languages explicitly reject SQL syntax decisions like Python (significant whitespace), "function(arg,arg,arg)" instead of constructs like "VERB arg,arg WITH TEXT,arg TEXT arg" (everything I've met bar lisp) for functions and not replicating broken English (everything).

Variables are a different thing because the declarative nature of SQL is a useful property. But standard SQL overall does such an appalling job of handling syntax I feel confident that they have gotten this part of the syntax wrong too.

Gotten a bit off topic, just happens to be a thing I feel strongly about.

Lisp is in no way closer to SQL syntax than languages like python. On the contrary, lisp syntax is much more regular.
Sigh, I never wanted to read the SQL standard before, but now that I know it’s behind a pay wall I want to read it.
I don’t believe there is a SQL standard.
What most users want is a statement that allows them to UPSERT, with reasonable guarantees around that never throwing a duplicate violation error, and never deadlocking (at least not in any novel way caused by implementation details). Something practical.

That's not the kind of thing that the standard takes a position on, for a variety of reasons. In general, it says very little about concurrency control/isolation. And what little is does say is all very abstract.

For whatever reason, a lot of people believe that MERGE promises something over and above what using multiple INSERT, UPDATE, and DELETE statements will promise. But that simply isn't true. The standard doesn't say anything that even vaguely suggests otherwise. Same is true of both the SQL Server MERGE docs, and the Oracle MERGE docs.

Before you recommend anything, read this about MSSQL's dangerously borked MERGE https://www.mssqltips.com/sqlservertip/3074/use-caution-with...

Anyone using MSSQL should read it.