Hacker News new | ask | show | jobs
by Jemaclus 1456 days ago
> And, if you can, please, limit the use of anything that can hold a NULL.

I'm curious: what's the alternative to NULL? I'm struggling to think of a database where NULL wouldn't be super useful. It feels like NULL as a concept is almost required, but I think you're suggesting that's a faulty assumption.

Would love to hear more about this.

3 comments

The article probably means: Define anything as non-nullable which can be non-nullable. Unfortunately SQL defaults to nullable, so there is a tendency to define too many columns as nullable. Normalization can also reduce the need for nullable columns in base tables (but you will get them back if you perform an outer join, so it is not a panacea).

But if a columns truly has unknown values, NULL's are the best ways to represent it. It is sometimes suggested to use "sentinel values" like empty string or -1 to represent missing values, but IMHO this is much worse than NULL's, since these will be treated as regular values by operators. When you have missing values, you want three-valued logic.

Gotcha. That makes sense. Thank you!
not op, but ty.
Noticed I said "limit", and not "eliminate". The concept of NULLS in an RDBMS has been discussed and argued for decades. Three valued logic is generally not well understood, and as such, it's usually skipped over. Binary logic is easy...0 or 1. It's there, or not. OFF/ON. 3 valued logic introduces a third state: "unknown". It really means that there is no meaningful answer...not yet, anyway. In simple terms, NULL in RDBMS is dangerously often equated to 0 (zero) in numeric fields. Or "space" in character fields. Neither are true. On occasion, these might behave as such...but you are playing with fire here. What's worse, you can't compare a NULL to a NULL. NULL != NULL. Which is why you often see the "IS NULL" operator used in DML for such things. What it boils down to is that your applications need to pay careful attention when digging around (read: joining) tables with NULLS. Additional code logic is often required to ensure that things work the way you expect them to when NULLS are involved. Formal primary keys cannot NULL (this is enforced by the RDBMS) but it does not stop ad-hoc clever queries from including NULL columns as part of the "where..." clause. So what do? You can tell your DBA to ensure that all columns are NOT NULL. This really tightens things down, and makes some operations a bit more sane. However, if a column value is actually not known (yet!) then one is forced to populate it with data that may not be correct/relevant. These are often called "sentinal" values and can cause a mess of their own. There are use cases where a RDBMS schema with everything as NOT NULL can make sense. In my experience, databases whose data is never (directly) seen/input by actual people can work. When a human sees a field with "placeholder value" instead of just blank space..it is uncomfortable. My advice is to really understand why something might be NULL, and don't blindly add a mess of columns to a table as NULL because it's easy. Remember, that shit will live forever. Google around for "three valued logic" and start down the rabbit hole. Long-term (think: migrating from one RDBMS impl to another) you will absolutely find that NULLs don't behave the same. Various operations may or not be consistent from one to another...and this will break your apps. The key (haha) relationships modeled in your schema...if you strip all the unimportant stuff away...should avoid NULL. The flip side of this is to do a code scan (app side) and search for "is NULL" , "is NOT NULL" in the embedded SQL. Especially when there are a lot of "and ____ IS NOT NULL and ___IS NOT NULL" and so forth. This will indicate those parts of the database that are "hot spots" for NULL issues. I have seen SQL where 80% of the DML is taken up with NULL handling of some kind.
Ah, that makes a lot of sense. Thank you for the explanation!