Hacker News new | ask | show | jobs
by maxbond 1025 days ago
Let's say we're in a situation like this:

    create table people (
        name text primary key,
        age integer
    );
    insert into people (name, age) values ('Bob', 50), ('Alice', null), ('Mallory', null);
We know Bob is 50, but we don't know the age of Alice or Mallory.

Are Bob and Alice the same age? We don't know.

Are Alice and Mallory the same age? We don't know.

If `null` were equal to `null,` we couldn't express missing information like this. Naive queries would still have bugs and conclude that Alice and Mallory were the same age.

Null is a property of living in an uncertain universe. When you start taking measurements of the real world (which every production business process is), you run into the many faces of null.

For example, I used to take data in a lab. Often we'd measure three replica samples. Sometimes I'd spill one of them. Scientific ethics demand I still write down the two measurements I did collect, even if I go get three more samples. For the third, I'd write the pen and paper equivalent of null (striking out that cell in the table).

Were any two of those missing measurements equal? Does a dog have Buddha nature?

2 comments

I think the mistake here was the name chosen. The NaN concept from floating point has an identical definition and behavior, so calling it something like NaV (not a value) or undefined (similar but not identical behavior in JS) instead might have caused less confusion vs NULL typically having a definition of an implicit Option<T>
I addressed all of that in advance of your posting. We live with the equivalent `null == null` in plenty of other contexts, and not only does it not blow up, it in fact works better and has fewer bugs. It isn't even as if math demands this; it is merely one choice and one that is observably a mistake.
I disagree that it's an arbitrary choice, it's asserting statements as true which we don't know anything about. I also disagree that there is such a distinction between a "mathematical context" and a "programming context" here - what we're talking about is formal logic (rather than proper math), and programming is an application of formal logic. Indeed, that's why we're here in the first place, to discuss how any() and all() arise from set theory and predicate logic.

I don't know what systems you're referring to, but like I mentioned you will get bugs if you naively use either approach. Eg in Rust, Option<T>::None will compare equal, because Rust doesn't have a null concept and that's how Rust enums work. But if you were using None to represent missing information, you'd want to use pattern matching rather than the built in equality. (Missing information is a much less salient concept in Rust than SQL, and Rust very deliberately wanted to avoid having a language-level null concept, so I think both of them made the right choices for their problem domains. Though I agree with this comment[1].)

But I'm getting the vibe that we're not going to convince one another, which is fine, this is an area where reasonable people disagree.

[1] https://news.ycombinator.com/item?id=37271039

> and that's how Rust enums work

It shows that I haven't written Rust in a few months, that's not how Rust enums work, that's how the derive macro for PartialEq/Eq works. D'oh!