|
|
|
|
|
by westurner
229 days ago
|
|
If you don't handle NaN values, and there are NaNs in the real observations made for example with real sensors that sometimes return NaN and outliers, then the sort order there is indeterminate regardless of whether NaN==NaN; the identity function collides because there isn't enough entropy for there to be partial ordering or total ordering if multiple records have the same key value of NaN. How should an algorithm specify that it should sort by insertion order instead of memory address order if the sort key is NaN for multiple records? That's the default in SQL Relational Algebra IIRC? |
|
Well each programming language has a "sort" method that sorts arrays. Should this method throw an exception in case of NaN? I think the NaN rules were the wrong decision. Because of these rules, everywhere there are floating point numbers, the libraries have to have special code for NaN, even if they don't care about NaN. Otherwise there might be ugly bugs, like sorting running into endless loops, data loss, etc. But well, it can't be changed now.
The best description of the decision is probably [1], where Stephen Canon (former member of the IEEE-754 committee if I understand correctly) explains the reasoning.
[1] https://stackoverflow.com/questions/1565164/what-is-the-rati...