Hacker News new | ask | show | jobs
by lmm 2001 days ago
> NaN itself is part of the IEEE 754 floating point standard. It behaves the way it does for a reason.

The reason is that it was a workaround for the limitations of 1980s hardware that has been mindlessly cargo-culted ever since. Silent NaN-propagation is a second billion dollar mistake and equally worth avoiding in new languages.

2 comments

I quibble with your characterisation of IEEE 754 ubiquity being mindlessly cargo-culted, because there’s a runtime cost to doing things any other way. Specifically, all mainstream hardware has instructions that work the IEEE 754 way, so if you want to not propagate NaN, I believe you’ll need to either implement those operations manually, or check for NaN after every operation and convert it into an exception or panic or whatever. I’m not certain what the cost would be, being no compiler engineer or hand-coder of assembly, but I’m fairly confident that there is one. I suspect that cost may also be inordinate on various realistic workloads.

(Since you mention billion-dollar mistakes: you can remove null pointers from a language at no runtime cost, in a few different ways—e.g. require that types be explicitly nullable, or Rust-style Option<T> optimised to squeeze the None/Some discriminant into any spare space in T, so that for sized T, &T and Option<&T> both take only one word.)

The issue isn't actually IEEE 754 but rather common C implementations; most hardware supports much more in the way of signalling NaNs and floating point exceptions, but most languages blindly follow the way C does it. It's the same with rounding modes; 754 provides a decent range of rounding modes marred by a poor default, but C and everything that follows it expects that default to be used.
> I’m not certain what the cost would be, being no compiler engineer or hand-coder of assembly, but I’m fairly confident that there is one. I suspect that cost may also be inordinate on various realistic workloads.

Back in the PowerPC days, something along these lines happened with Java. Java mandates that integer divide-by-zero throws an exception, but PowerPC doesn't have a mechanism to signal that a divide-by-zero occurred (1/0 == 0, as far as PowerPC is concerned). This forced the JVM to wrap every integer divide with an explicit test for a zero divisor, which put a significant drag on JVM performance...

People have been pushing for overflow checks for integer arithmetic on C on the basis that the cost isn't too high. For floating point it can only be lower.

We are probably talking about single-digit percent increases in run time. For some software that's relevant, but not for most.

Reasonable people can differ on whether quiet NaN propagation is good or bad, so I don't fault you for being in the latter camp, but I am surprised by your suggestion that it was particularly friendly to 1980s hardware. I was under the impression that an 8087 would have been equally happy to respond to NaN by quiet propagation, setting a global sticky flag or crashing your program, and it's modern hardware, with multiple heavily pipelined vector units running in parallel, that prefers to quietly propagate. Am I missing something?