Hacker News new | ask | show | jobs
by recursive 524 days ago
In nth normal form, you can't have 'no value'. That would mean your model is wrong. In academic relational data books, null does mean "unknown". There is a value, we just don't know what it is (yet).

If there might actually not be such a value, you're supposed to change your schema to reflect that.

2 comments

> There is a value, we just don't know what it is (yet).

In all my years, I've never used null for that. If I don't have a value yet then generally I'm not writing any part of the record. I only ever use nulls for the absence of a value. Creating a new table for every potential optional column is the "academic relational way" and also insane. :)

What happens if your data is produced by some automated process such as a sensor reading and occasionally the sensor fails to return a value? NULL seems exactly the appropriate value to use.
Then you're supposed to use another table with a foreign key to canonical measurement record. This is the concept of fully normalized schemas.

What you're describing is closer to how people do it in practice.

I'm still a bit confused. Suppose you have another table, call it temperatures with columns id and temperature, where every row contains only a valid temperature (no NULL records), and you have a main logging table with date and temperature_id so that you can join on temperature_id = temperatures.id. This seems to be what you mean, with a canonical measurement record table related via the temperature_id foreign key.

But then if your sensor fails to record a measurement don't you end up with NULL for that row's temperature_id?

Foreign key would probably go the other way:

LogEntry(LogEntryId, Date)

Temperature(TemperatureId, LogEntryId, DegreesF)

If there is no temperature measured, then you don't create a record in Temperature.

Ah but then how do you record that the measurement actually happened but did not produce a value? I want a record of that failure.

I mean sure, you could do yet another table. But honestly that level of normalization is much more work than it's worth. Just because it's some academic definition doesn't make it right.

You put a record in the Measurement table, and none in the Value table.

> But honestly that level of normalization is much more work than it's worth

Yes. I question whether it's worth anything to begin with.

     Select M.*, V.* from Measurement as M left outer join Value_table as V…
You end up with nulls again.
Yes. That's part of the semantics of outer joins.

But there is no stored null representing 'no value'.