Hacker News new | ask | show | jobs
by jewel 1025 days ago
> SQL NULL can't be far behind.

SQL's NULL is far more useful though, since NULL != NULL. That lets you do things like join two tables with a column and not have the NULLs match.

Consider, for example:

    select * from users where id = null;
That will give 0 results, even if somehow you have a user with a null id.

It's also really nice for non-ID, non-reference fields to represent N/A. For example, user.birthdate can be NULL if the birthdate is not known.

SQL also lets you choose when you want NULL to be allowed for a column, so it's less likely to be a gotcha.

3 comments

> SQL's NULL is far more useful though, since NULL != NULL.

That's not true! (Nor is it false.) NULL = NULL IS NULL, NULL != NULL IS NULL. I can't overstate this: as soon as either (or both) operand of any binary relational operator is NULL, the whole expression is NULL!

I'm not being an asshole, this is the exact mistake that makes SQL NULL the other billion dollar sink!

This is really funny, because i wasn't aware of laying such a beatiful trap.

> as soon as either (or both) operand of any binary relational operator is NULL, the whole expression is NULL!

Isn't is a binary operator?

Because "NULL is NULL" is True (not NULL).

No, IS NULL is an unfortunately named unary postfix operator.
SQL NULL has to be the quintessential example of why programming can't just blindly follow math. The NULL truth table makes all the sense in the world in a mathematical context.. and it is a total pain to program with. Letting NULL == NULL be "true" and NULL == anything else be "false" might be slightly less principled, but it would be soooo much safer and have prevent so very many bugs.

(And it's not like that's some sort of mathematical nonsense either. It makes perfect sense to not create a special value that represent "outside my universe of values" and then refuse to equate any two of them. It makes perfect sense to declare a distinguished "outside my universe of values" and let it be equal to any other such value on the grounds that it is mathematically impossible to witness any differences between them anyhow so within the system they are indeed equal.)

Let's say we're in a situation like this:

    create table people (
        name text primary key,
        age integer
    );
    insert into people (name, age) values ('Bob', 50), ('Alice', null), ('Mallory', null);
We know Bob is 50, but we don't know the age of Alice or Mallory.

Are Bob and Alice the same age? We don't know.

Are Alice and Mallory the same age? We don't know.

If `null` were equal to `null,` we couldn't express missing information like this. Naive queries would still have bugs and conclude that Alice and Mallory were the same age.

Null is a property of living in an uncertain universe. When you start taking measurements of the real world (which every production business process is), you run into the many faces of null.

For example, I used to take data in a lab. Often we'd measure three replica samples. Sometimes I'd spill one of them. Scientific ethics demand I still write down the two measurements I did collect, even if I go get three more samples. For the third, I'd write the pen and paper equivalent of null (striking out that cell in the table).

Were any two of those missing measurements equal? Does a dog have Buddha nature?

I think the mistake here was the name chosen. The NaN concept from floating point has an identical definition and behavior, so calling it something like NaV (not a value) or undefined (similar but not identical behavior in JS) instead might have caused less confusion vs NULL typically having a definition of an implicit Option<T>
I addressed all of that in advance of your posting. We live with the equivalent `null == null` in plenty of other contexts, and not only does it not blow up, it in fact works better and has fewer bugs. It isn't even as if math demands this; it is merely one choice and one that is observably a mistake.
I disagree that it's an arbitrary choice, it's asserting statements as true which we don't know anything about. I also disagree that there is such a distinction between a "mathematical context" and a "programming context" here - what we're talking about is formal logic (rather than proper math), and programming is an application of formal logic. Indeed, that's why we're here in the first place, to discuss how any() and all() arise from set theory and predicate logic.

I don't know what systems you're referring to, but like I mentioned you will get bugs if you naively use either approach. Eg in Rust, Option<T>::None will compare equal, because Rust doesn't have a null concept and that's how Rust enums work. But if you were using None to represent missing information, you'd want to use pattern matching rather than the built in equality. (Missing information is a much less salient concept in Rust than SQL, and Rust very deliberately wanted to avoid having a language-level null concept, so I think both of them made the right choices for their problem domains. Though I agree with this comment[1].)

But I'm getting the vibe that we're not going to convince one another, which is fine, this is an area where reasonable people disagree.

[1] https://news.ycombinator.com/item?id=37271039

> and that's how Rust enums work

It shows that I haven't written Rust in a few months, that's not how Rust enums work, that's how the derive macro for PartialEq/Eq works. D'oh!

Similar to undefined in javascript. (undefined == anything) is always false, even if anything also happen to be undefined.
undefined === undefined is true which is not true for SQL NULL.
Sorry I must have mixed up something. Even == returns true after quick test.
For anyone wondering how to fix the above query so that it works for arbitrary X (which can be NULL or non-NULL):

    SELECT * FROM users WHERE status IS NOT DISTINCT FROM $x
> Consider, for example: > select * from users where id = null; > That will give 0 results, even if somehow you have a user with a null id.

Why is it useful? Can you elaborate?

Not GP, but let's say that null in the query wasn't a literal but was the result of another query that returned null. Let's say that the query returned null because not user existed who met the criteria specified by the query, but because we were doing a complex join or aggregation or something, we didn't simply get back 0 rows.

In that case it would be a bug if we matched with this errant user who's id is null.