Hacker News new | ask | show | jobs
by SigmundA 520 days ago
>That's because "different" and "distinct" don't mean the same thing.

The literal definition distinct is:

>recognizably different in nature from something else of a similar type.

If you want to get down to it nothing is "equal" or the same.

Is a temperature measurement 25C the same as another of 25C? No these measurements are an approximation of the actual values which are actually not equal to each other they are distinct they have just been lumped into the same 25C group due to the resolution of measurement yet equality works just fine on that value in sql.

I have used SQL for a long time null handling is weird and inconsistent and a waste of time. For all the language bugs due to the existence of null at least I can count on null=null and not write garbage like value=param or (param is null and value is null)

3 comments

> The literal definition distinct is

Irrelevant. What matters is the meaning in the context of SQL.

> weird and inconsistent and a waste of time. For all the language bugs due to the existence of null

There are necessary, semantic cases that need to be dealt with. How else would you do it?

Also, it's really weird to use "bugs" to refer to well defined and well documented behavior.

I wanted to briefly reinforce this point with the fact that SQL has multiple equality operators - there is both `=` and `IS NOT DISTINCT FROM`. The later operator will treat null values as equal in alignment with the `DISTINCT` and `DISTINCT ON` operators.

It is extremely easy using partial uniques and the proper operators to treat nulls as non-distinct values and the approach we have allows some very important functionality that aligns with standard statistical data treatment which was influential to how SQL logic was originally designed.

> Is a temperature measurement 25C the same as another of 25C?

Yes, the measurements are the same.

The actual temperatures probably are not, but measurements are not the same as the thing measured.

>Yes, the measurements are the same.

By the logic two unknown (null) measurements are the same regardless of the actual value which I agree with.

An unknown measurement isn't a measurement value its a statement of (lack of) knowledge about a measurement, that doesn't tell you what the measurement is. Knowledge about a measurement is as different from the measurement as the measurement itself is from the thing measured.

Whether two unknown measurements are the same is unknown.

Whether two measurements of 25C are the same is unknown, these are just values recorded in a database. 25 is a value, null is a value.

The values in the db are the same in both cases which is what I would like my db language to deal with and not make assumptions about what that value actually means.

I see no value in treating null special when in comes to equality in a sql db, in fact it is a hinderance that it does so in my experience.

The SQL null is a database-specific keyword and not something that's part of the domain of your measurements. If you want some kind of sentinel value for your measurements that doesn't have the "unknown" meaning of SQL null, then you should use your own sentinel value and not reuse null for that purpose
Sentinel values suck especially when the language already has a perfectly good one built in.

Is 0 for a temp measurement unknown sentinel or an actual measurement, how about 2,147,483,647 great probably not a measurement now its always included in greater than queries same with max negative with less than.

Null separates the value into its own distinct group and prevents it from being including in range queries due to it not being an actual numeric value while most languages still allow you to compare equality using standard operators.

Sum types would be great in sql but currently we get a union of the sql type and null, so null for sentinel values it is except for the having to using weird syntax to compare it.

Changing the emphasis.

> recognizably different in nature from something else of a similar type.

But anyways, the point wasn't to justify the choices of SQL but rather as a way to make intuitive sense of its logic. SQL is one of the oldest and most successful programming languages in existence, we are not going to change it, and it is not going to disappear anytime soon, so we have to go with it, like it or not. There have been some attempts at alternatives, both at changing the paradigm (NoSQL) and at cleaning up the language, which, to be fair, would be a good thing, but without much success. The relational paradigm just works, and SQL is usable enough to make the cost of switching not worth it.

Edit:

And writing things like "value=param or (param is null and value is null)" is usually the sign of a poor understanding of the NULL logic. You are using it wrong basically. Sometimes, it is the right thing to do, but if that pattern starts appearing all over the place, it is usually a result of thinking "NULL is broken, I have to use this pattern to handle NULL properly". That's cargo culting, don't fix problems you don't understand by copy-pasting code you don't understand.

Note: this is not addressed to "you" in particular, there can be good reasons, no offense intended. But I think that in general, it is a code smell.

If it is not recognizably different than it is the same in that context correct?

Two measurements of 25C are not recognizably different therefore they are equal, correct, regardless if the actual temperatures are not the same?

Two measurements of unknown are not recognizably different therefore they are equal in the context of the database.

Having null!=null has never been intuitive to me especially since every other programming language treats them equal. I am not hoping this gets changed, I know SQL is to far along for that, I can still complain about it and agree its wierd.

>And writing things like "value=param or (param is null and value is null)" is usually the sign of a poor understanding of the NULL logic.

It's needed with parametrized sql when your db doesn't support "is not distinct from" which is itself a silly way to just write '=' or '==' like a normal programming language. The distinct predict exist for this very reason to have yet another way to express equality that includes nulls: https://modern-sql.com/caniuse/T151