Hacker News new | ask | show | jobs
by Const-me 1936 days ago
> If anyone knows why this design was not chosen and what fundamental downsides it has, I'd love to hear it.

No fundamental ones, but a few practical.

32-bit numbers have 2^32 unique values, the number is even. Your approach makes the range asymmetrical like it happens with integers.

The range for 8-bit signed integers is [ -128 .. +127 ]. On ARM NEON there’re two versions of integer negate and absolute instructions, some (like vqnegq_s8 or vqabsq_s8) do saturation i.e. transform -128 into +127, others (vnegq_s8, vabsq_s8) don’t change -128. Neither of them is particularly good: the saturated version violates -(-x) == x, non-saturated version violates abs(x)>=0. Same applies to the rest of the signed integers (16, 32, 64 bits), an all modern platforms.

With IEEE floats the range is symmetrical and none of that is needed. Moreover, PCs don’t have absolute or negate instructions, but they instead have bitwise instructions processing floats, like andps, orps, xorps, andnps, they allow to flip, clear or set just the sign bit, very fast.

Another useful property of IEEE representation is that for two intervals [ 0 .. FLT_MAX ] and [-FLT_MAX .. -0.0f ] sort order of floats corresponds to [inverted] sort order of 32-bit integers.

2 comments

> Your approach makes the range asymmetrical like it happens with integers.

Not necessarily since you've got (a lot of different) NaNs anyway. For the sake of argument, you could give up one one of them and make it positive zero (since this representation would be unnatural, it would slow stuff down, just as non-finite values do on many CPUs. Wouldn't matter that much since signed zeros would only arise from underflow).

> Another useful property of IEEE representation is that for two intervals [ 0 .. FLT_MAX ] and [-FLT_MAX .. -0.0f ] sort order of floats corresponds to [inverted] sort order of 32-bit integers.

I'm aware, but as you correctly note this only works in the right direction for unsigned values. And it's just not that important a benefit, I'd much rather have my calculations come out right than being able to sort positive floating point numbers with an integer sort routine.

> you could give up one one of them and make it positive zero

What do you expect to happen when you set the sign bit of that number? A possible answer to that is "negative zero", and now you have 2 separate encodings for negative zeroes: one of them with exponent 0, another one 0xFF, and they behave slightly differently.

If you manually screwed around with the most significant bit of the underlying bit pattern (rather than just writing -x like any normal person) I'd expect to get a NaN -- assuming of we keep an explicit sign bit in the representation at all. It would obviously make hardware implementation of negation more complex (and thus slower) but I don't think there is any conceptual problem.
PC hardware doesn’t have hardware implementation of negation.

Possible to do in 2 instruction, xorps to make zero, then subps to subtract. Combined, they gonna take 4-5 cycles of latency (xorps is 1 cycle, subps is 3 cycles on AMD, 4 cycles on Intel).

If you do that a lot, a single xorps with a magic number -0.0f gonna negate these floats 4-5 times faster. People don’t pay me because I’m a normal person, they do that because I write fast code for them :-)

On a serious note, I’d rather have the current +0.0 and -0.0 IEEE values to be equal and be the exact zero, and make another one with 0xFF exponent encoding inexact zeroes, +0.0f or -0.0f depending on the sign bit.

Or another option, redefine FLT_MIN to be 2.8E-45, and reuse the current FLT_MIN, which is 1.4E-45 / 0x00000001 bit pattern, as inexact zeroes.

I wonder if it would be useful to have a signed integer that has symmetric range and the one that is left is used as NaN. Overflows would set it to NaN for example. Then again, once that's on the table it's very tempting to steal two more values for +/- inf. I think it's very useful to have full range unsigned ints but signed ones could have range reduced to make them less error prone.