Hacker News new | ask | show | jobs
by robconery 2960 days ago
OP here - as a matter of fact I try to do just this, starting with the database. I'm mostly a data person so I try to think through, as deeply as I can, what I should expect in every table - there has to be a sensible default and if I can't find one then I rethink my design. You'll probably disagree with me and grunt out another single sentence missive, which is fine, but I think it's worth taking some extra time and using Null as a bit of a warning. It's a crutch! A way to stop thinking and say "whatever I don't know what this value is supposed to be so... it's null. Let's go shopping!"
2 comments

Frankly, the idea that there must be sensible default worries me.

Take a database of people. There is literally no sensible default for name, age, gender, height, weight, social security...

If you’re amazon, your products have no sensible default for manufacturer, shipping weight/size, delivery address...

In fact, for just about any real-world data, there simply is no sensible default for anything at all. Most “sensible” defaults will eventually bite you in the arse. The only sane way to keep nulls from your DB is to refuse inserting incomplete data in the first place, and propagate the error to the user. Heavens save your team if you’re dealing with batch data and insist on not allowing nulls in the DB, though.

You can sweep this mess under a rug and pretend you have no nulls by turning things into relations that are allowed to be empty — “there are no delivery_address rows for this user” — but that’s a null in sheep’s clothing. Either your application knows how to deal with the query coming up empty, or it doesn’t.

What do you use in a database when you have a field where you literally do not know what the value should be?
If you have a PEOPLE table and some birthdates are unknown, then remove the "birthdate" column and make another table called PEOPLE_BIRTHDATES with a "birthdate" column and a foreign key pointing to PEOPLE. Now your queries can have lots of left joins. The results will still have nulls, however.
Which is the reason why you shouldn't write outer joins.
So if we don't know the customer's birthdate we can't serve her? I can imagine a problem with that...
Sigh.

Where have I said any such thing ?

If there's no row for the customer in the joined table, the customer won't show up in an inner join.
> What do you use in a database when you have a field where you literally do not know what the value should be?

You don't.

If a value may not be present for an entity, it's not an attribute of the entity in question, it's an attribute of another entity that has a (0..1):1 relationship to the entity in question.

Normalization eliminates NULL.

That's great. Now I do a query. Maybe I use a join. If a row has the "0" case of that (0..1):1 relationship, what do I get?

Or maybe I don't do a join. Maybe I do a separate query. If the query comes back with zero rows, then I... what?

What do I get ? You get what you ask for.

Then I ... what ? Then you do what needs to be done as specified by the business in the case the queried piece of information is unknown.

> What do I get ? You get what you ask for.

In the join case, don't I get a NULL in the row that comes back if there isn't an entry in the other table? Or do I just not get a row?

> Then you do what needs to be done as specified by the business in the case the queried piece of information is unknown.

Sure, but how do I represent that condition in my software? With a different class/structure? With a flag that indicates that the other field isn't valid? Or with a null?

From where I sit, normalization doesn't make the problem go away at all.