Hacker News new | ask | show | jobs
by woevdbz 1445 days ago
> Personally, I think that the existing PostgreSQL behavior (NULLS DISTINCT) is the "right" one

SQL is old enough and this debate so unsettled still that I think it should be clear there isn't a categorically right behavior here anymore than there is a clear winner between tabs and spaces.

As an example of the irreconciliable weirdness of NULL, consider that "NULL = NULL" is false, and so is "NULL != NULL", while rows with NULL still group together in GROUP BY.

I appreciate you giving folks the option.

4 comments

> As an example of the irreconciliable weirdness of NULL, consider that "NULL = NULL" is false, and so is "NULL != NULL

This isn't quite true - these comparisons don't evaluate to false, they evaluate to NULL.

I intuitively think of NULL as "unable to compute," a generalized NaN.

> these comparisons don't evaluate to false, they evaluate to NULL

In the databases I've used, such as MSSQL[1], they evaluate to UNKNOWN.

[1]: https://docs.microsoft.com/en-us/sql/t-sql/language-elements...

The document you link uses unknown as a synonym for null. If you inserted the result of such a comparison into a table, the value inserted would be NULL.
I'm self-taught, could you show how to insert the result of such a comparison into a table in mssql? I can only think of using CASE or similar, which would mean no direct conversion.
insert into table (c) values (null != null)
Can you provide a complete working example? I'm just getting "incorrect syntax near !" with both mssql and sqlanywhere.

    create table #nulltest (a bit null);
    insert into #nulltest(a) values (0),(1);
    insert into #nulltest(a) values (null != null);
It's good to see that everyone agrees on the definition of NULL.
Yes: a billion-dollar mistake.
This is s misunderstandig of Hoares “billion dollar mistake”. The mistake was not the existence of nulls, the mistake was type systems where every reference could be null.

Sql does not have this problem.

And with RDBMSs, a multi-trillion dollar success!
This is a good way to remember the SQL behavior. Though I do hope it'll change in the future. So many other languages interpret null as a comparable value, and it will be less surprising to folks coming from those.
I personally don't have issues with languages that have incomparable values, just with languages that have a generic equality and incomparable values.

If every type has it's own equality operation (e.g. Number.eq) then I have no issue with it being weird in unexpected types (e.g. Number.eq(NULL, NULL) =false)

For example in JavaScript almost every use of === should be replaced by Object.is.

I agree, however that's all fine and well but it turns the languages syntax into dogshit. A babel plugin (transpiling) would be ideal.

On the same note, the original intent was for NaN to be insidious and propagate into every derived calculation or comparison, thus making it easy to tell if a mathematical function had failed, without having to add conditionals at every step within it. Which makes a lot of sense. However, not being able to know exactly where the NaN was produced is kind of shit. Now, if NaN had a stack trace property like Error (exception) objects...it would make this behavior much more useful. Makes me wonder if the prototype of NaN can be extended, hm.

NaNs cannot be extended in any way, in many interpreters NaN is used to "hide" pointers in 64 bit floats using the 52 free bits in NaN values.

IMO NaN should have never existed, like signed zeros, (I am more agnostic about [signed] Infinity) I think that they were implemented in IEEE 754 because the encoding had space left for them.

I not an expert in any way on this this is literally a personal opinion.

Right. I was thinking that maybe NaN in JS was more than just the IEEE version we see in languages closer to the metal.
> This isn't quite true - these comparisons don't evaluate to false, they evaluate to NULL.

Which large parts of SQL treat as equivalent to false.

    WITH vars AS (
        SELECT CAST(null AS INT) as X
    )
    SELECT 'yes this is a tautology' as "is this is a tautology"
    FROM vars
    WHERE X = X
The above query returns nothing, not an error because the value of X is unknowable. In a sane language, X=X and so it would yield "yes this is a tautology". SQL standard wants it both ways - it wants to say "boolean null isn't false" except when for its own convenience it decides that yes, null is false.

The truth, imho, is that SQL has an archaic type system. In this day and age it should be offering algebraic types where we can implement our own "empty" values and the like that may not use three-value logic as appropriate, so that I can say that for example End Date is blank instead of either unknown or Date.MaxValue.

> SQL standard wants it both ways - it wants to say "boolean null isn't false" except when for its own convenience it decides that yes, null is false.

I don't see it that way. Null cast to boolean is always false. Null equality is undefined. There is no contradiction there.

I think that this topic gets tripped up with people wanting to assume boolean means two possible states... NULL adds an extra state to every type much like the "bottom" type/value in some functional programming languages or type systems.

It's not the `=` operator that is special or significant here. It is the `WHERE` clause semantics, which are defined as returning rows when the boolean expression is true. This treats both false and null cases as the other possible outcome, when rows are not returned. Not understanding this distinction is also where people write buggy queries using WHERE NOT ..., because like equality the NOT operator will return NULL for a NULL input (so NULL is not like false and does not negate to true).

SELECT 'hidden by null result' WHERE CAST(NULL AS boolean);

SELECT 'also hidden by null result' WHERE NOT CAST(NULL AS boolean);

SELECT 'shown by true result' WHERE CAST(NULL AS boolean) IS NULL;

SELECT 'hidden by false result' WHERE NOT (CAST(NULL AS boolean) IS NULL);

The existing GROUP BY behavior is different than WHERE because it does not have a special rule for handling true values. It just returns a group for each distinct value.

The prior (new default) UNIQUE constraint behavior made sense when you think of NULL as this bottom value and meaning "unknown", while also thinking that a uniqueness constraint is only applied to complete keys. The check is effectively deferred, and the partial or incomplete tuple might become valid and unique once its missing part(s) are filled in with real values. The new optional behavior makes more sense if you think that the uniqueness constraint enforces distinct records even with partial keys, just as the GROUP BY clause will determine distinct groups.

There is also a similar issue with FOREIGN KEY constraints. The default MATCH SIMPLE behavior is consistent with the default UNIQUE constraint behavior. An incomplete foreign key is essentially ignored and not subject to integrity checks. The MATCH FULL behavior is almost a style checker to say that a compound foreign key should be all null or all non-null, while still ignoring incomplete foreign keys. As far as I know, the MATCH PARTIAL case is not implemented, but I think it might complement this new UNIQUE NULLS NOT DISTINCT behavior?

One can imagine a further branching point in behaviors for foreign keys and I'm not sure how MATCH PARTIAL will be defined. One mode might be easily implementable with the new index method, if we want partial foreign keys to be able to reference partial keys in the referred table, i.e. the partial key with NULLs is considered an actual row key. But we might also imagine another partial matching mode where the the non-null components of a compound foreign key have to all exist in at least one key in the referred table, while ignoring the null fields as incompletely formed references. This would be an improvement on MATCH SIMPLE which allows the non-null columns in a foreign key to hold values which could not possibly match the referring table no matter which values are filled in for the remaining null fields of the foreign key. However, this enforcement would not be easily implemented without a powerset of indexes for all partial combinations allowed in the referencing tables!

Er, sorry, but I thought that the answer to tabs vs spaces was use tabs for semantic/logical indent, use spaces for physical/alignment indent.
Yes, mixing invisible characters will solve all our problems.
They don't have to be invisible if you use a good editor.
> anymore than there is a clear winner between tabs and spaces.

But there is a clear winner here. Tabs won, almost all code files use spaces. Of course everyone uses the tab key on their keyboard, and the IDE makes spaces for you; no one is going to manually press space 4 times.

There is no winner, both tabs snd spaces are widely used. However with proliferation of better tools the question thankfully became mostly irrelevant. I don't care if it is spaces or tabs, as long as it is consistent across codebase (which CI enforces with autoformatters and linters) and as long as my editor (which obeys project's editorconfig) uses the same convention.
As a savage, I use 4 tabs now.
I hit space 4 times...
...which is clear evidence that hitting space 4 times is the wrong thing to do :P
> SQL is old enough and this debate so unsettled

Is the debate really unsettled?

MSSQL considers nulls to be equal, everyone else considers them to be distinct (aside from a few DBs which just refuse to create UNIQUE indexes on nullable columns), and Oracle is apparently huffling glue in a corner.

> As an example of the irreconciliable weirdness of NULL, consider that "NULL = NULL" is false, and so is "NULL != NULL", while rows with NULL still group together in GROUP BY.

Group by is the exception here. And arguably it is so because the alternative is never useful, or close enough as to not be worth it.

So of the three major SQL implementations, you have one of them following the results of the settled debate, one of them being an exception and one of them huffling glue (whatever that means). In addition, there's a language exception even in the settled group, which you have to handwave away.

Sounds pretty unsettled to me.

Oracle considers null to be equal to empty string.
That is truly huffing glue.
And oracle considers null to be equal to empty string. My understanding is oracle was created before the sql standard existed and they don’t want to change because of backwards comparability. I don’t understand why they can’t add a db parameter to allow you to set how nulls are handled.
IDK if you ever saw the HN thread where they talked about oracle and change tests from a few years back, but if I had to guess, the pain just isn't worth the overall cost.

Overall I remember running into this but finding it fairly simple to add a couple specific checks around the condition where it came up in our code.

They would need to change their storage format to be able to distinguish between NULL and empty string, and probably a lot of in-memory structures and even more code logic as well. It would be hugely expensive to support both.

Oracle introduced VARCHAR2 in the 90s(?) to mean “Oracle semantics” as opposed to VARCHAR which is documented as “may change to standard SQL semantics in a future version”. However I don’t think they’ll ever materialize that distinction.