| > But let's study it a bit. Suppose you are searching an array for a value, and the value is not in the array. What do you return for an index into the array? People often use -1 as the "not found" value. But then what happens when the -1 value is not noticed? It winds up corrupting further attempts to use it. The problem is that integers do not have a NaN value to use for this. You return (value, found), or (value,error) or Result<T>. >NaN has value beyond that. Suppose you have an array of sensors. One of those sensors goes bad (like they always do). What value to you use for the bad sensor? NaN. Then, when the data is crunched, if the result is NaN, you know that your result comes from bad data. Compare with setting the bad input to 0.0. You
never know how that affects your results. You return error and handle the error. You want to know sensor is wonky or returns bad data. Also you technically should use signalling NaN for "this is error" and quiet NaN for "this just impossible math result", which makes it even more error prone. Just return fucking error if it is a function. Sure, useful for expressions but the handling should be there and then, and if function can have error it should return it explicitly as error, else you have different error handling for different types of functions. > This is why D (in one of its more controversial choices) sets uninitialized floating point values to NaN rather than the more conventional choice of 0.0. I'd like to see how much of the code actually uses that as a feature and not just sets it to 0.0 (or initializes it right away) |
And this is great for environments that can support it, but as the levels get lower and lower, such safety nets become prohibitively expensive.
Take data formats, for example. Say we have a small device that records ieee754 binary float32 readings. A simple format might be something like this:
We use a signaling NaN to record an error in the sensor reading, and we use the encoding 0xffffffff (which is a quiet NaN) to mark the end of the record.If we wanted the validity signaling to be out-of-band, we'd need to encode it as such; perhaps as a "validity" bit preceding each record:
Now the format is more complicated, and we also have alignment problems due to each record entry being 33 bits. We could use a byte instead and lose to bloat a little: But we're still unaligned (40 bits per record), which will slow down ingestion. We could fix that by using a 32-bit validity "bit": But now we've doubled the size of the data format.Or perhaps we keep it as a separate bit array, padded to a 32-bit boundary to deal with alignment issues:
But now we've lost the ability of ad-hoc appends (we have to precede each record with a length), and the format is becoming a lot more complicated.