Hacker News new | ask | show | jobs
by jcranmer 1245 days ago
As far as I'm aware, the original unum proposals that Kahan was arguing against have been discarded by Gustafson and all that remains now for current advocacy is the posit type, which is essentially a floating-point type that's fixed-width with a variable-width exponent.

I don't know what the hardware costs of posits look like, since I'm not a hardware engineer, so I can't comment on that. For larger sizes, posits seem to be inferior to IEEE 754 floating-point. For smaller sizes (say 16-bit and smaller), posits may work better, as the limited size means that IEEE 754's scale invariant nature [1] isn't as relevant, and packing more distinct numbers into the same bitwidth is more valuable [2].

[1] Put simply, in a IEEE 754 number, it doesn't matter if you measure your distance in nanometers, meters, or light-years--you'll get the same relative error either way. This is emphatically not the case in posits, where your relative error depends on the scale of the numbers.

[2] Posits combine ±infinity and NaN into a single value, and also does away with -0.0. From a numerical perspective, this is actually pretty cringe--there's a useful distinction there (and Kahan's talk gives some examples here)--but by the time you're at small bitwidths, you're likely limiting yourself to situations where the utility of these special values are questionable.

1 comments

As to [2] I am very skeptical of the value of all the NaNs, -0 and Infs floating point has. NaN breaks x==x which is a pretty fundamental relationship for numbers to have. +-Inf sound useful in theory, but in practice they rarely give you a more useful result than NaN or the maximum/minimum value of your type (returning Inf on overflow has infinitely more error than returning the largest positive value, and if that isn't meaningful than an Inf probably wasn't either). Once you've gotten rid of -Inf, it becomes clear that -0.0 is a mistake. It breaks the identity 0+x==x and 0-x==-x. Furthermore, IEEE specifies sqrt(-0.0)==-0.0 and log(-0.0)==-Inf which are both nonsensical if you consider -0.0 as a limit from the negatives. Floats also have the unfortunate property that inv(x) can be infinite for finite x.
The value of -0 as distinguished from +0 has a few uses. The most obvious one is preserving sign in the case of overflow. A less obvious use case is handling branch cuts. There are uses in a few more cases: I've heard it's occasionally useful in things like coordinate systems, since something like "0°5'3" W" can be stored as (-0.0, 5, 3) after explosion and still display correctly. It's definitely niche, but it does have its uses.

Returning a distinct value that retains the fact that it overflowed is quite useful--if you get that result out of the computation, you know you overflowed the computation rather than silently getting a meaningless result. Note in particular that infinities end up being sticky values: once a value goes infinite, it tends to stay infinity, which isn't true for largest finite values. Distinguishing between various kinds of "invalid" values turns out to be moderately useful in practice--I've used infinities a couple of times in my own code.

NaNs are useful in representing a different kind of error than overflowed computation. Now there is a lot of room to criticize IEEE 754 here: "x != x" was quite frankly a mistake (basically the primary reason for it was the creators wanted to make testing for NaN easier than calling isnan(x)...). sNaNs are of course an abomination that just makes things worse. Multiple NaN payloads were originally intended (in part) to let developers debug the sources of NaNs, but this requires support that never really materialized. However, NaN payloads did find new use in making NaN-boxing a useful technique, and dedicating an entire exponent to special values simplifies several numerical analysis lemmas.

> handling branch cuts

I agree this sounds great in theory, but I don't think it works very well in practice. i.e. what about 1/(x+1)? Also branch cuts matter most for complex arithmetic, and there +-0 doesn't help since you don't know the phase of the zero. Also, realistically, floating point has finite precision so there are very few non-toy examples where you can do an actual computation and reliably end up on the correct branch. I'd rather have all the real numbers represented before we start adding hyper-reals to the number system.

> Returning a distinct value that retains the fact that it overflowed is quite useful

Agreed, and I think that NaR in Posits does a good job of that while not taking a ridiculous number of values.

> I agree this sounds great in theory, but I don't think it works very well in practice.

I've actually done it once in practice myself. I forget the exact details, though. As I said, it is a niche use case, but it's a useful to have when you are in that niche.

>NaN breaks x==x which is a pretty fundamental relationship for numbers to have

NaN is not a number, so it should NOT satisfy "fundamental relationships for numbers to have".

>+-Inf sound useful in theory, but in practice they rarely give you a more useful resu

There are algorithms that are more performant using infs, and without having a way to denote overflow, you'd have to pre-check evedry operation to do serious numerical work, which basically cuts your performance in half.

>Once you've gotten rid of -Inf, it becomes clear that -0.0 is a mistake

>It breaks the identity 0+x==x and 0-x==-x.

No, you have some fundamental misunderstanding. IEEE explicitly guarantees these hold, even for -0.

> Furthermore, IEEE specifies sqrt(-0.0)==-0.0 and log(-0.0)==-Inf which are both nonsensical if you consider -0.0 as a limit from the negatives.

You're making up strawmen. -0 is not a "limit from the negatives" any more than +0 is a limit form the positives, which would break other made up requirements. That is why making up stuff that has zero bearing on what IEEE 754 specifies is arguing strawmen.

>Floats also have the unfortunate property that inv(x) can be infinite for finite x.

Integers have the same property: -(X) can not be the negative of X. So this is not a problem except in made up goofiness.

Every objection you post is a lack of understanding numerical analysis and the needs of actual scientific software.

So you're skeptical- do you write numerical software professionally? I do, and have, and will do it in the future. There are very, very good reasons for all of those pieces you don't see the need for.

There's a reason unums have not caught on with the field of numerical software or numerical analysis - they simply don't allow writing robust, performant software, they solve no real issues, and add significant problems.

>so it should NOT satisfy "fundamental relationships for numbers to have".

If you have a list with a NaN in it, how should you make sort terminate (and where should the NaN end up)? I understand that in theory it is kind of arguable that NaN should be different, but breaking the total order is a really dumb decision.

>you'd have to pre-check evedry operation to do serious numerical work, which basically cuts your performance in half.

Can you give an example? Saturating overflow tends to do the same thing.

>IEEE explicitly guarantees these hold

This is kind of true. -0.0+0.0==0.0 and 0.0-0.0==0.0. IEEE does define -0.0==0.0 so IEEE does technically make this hold, but only by redefining == so that two different numbers are ==

> -0 is not a "limit from the negatives" Then what is it? it's not a real number, and Kahan's justification of them comes from branch cuts of analytic functions which is only makes sense in the context of limits https://homes.cs.washington.edu/~ztatlock/599z-17sp/papers/b...

> Integers have the same property Yeah and it sucks there too. In the fp case it makes it really annoying to do things like calculate divide an array by a float quickly and accurately. You would want to take the inverse of the divisor and multiply by that, but doing so isn't safe if the divisor is subnormal.

Yes. My day job is in solving Differential Algebraic equations, but I also have written a bunch of Julia's Libm.

>If you have a list with a NaN in it, how should you make sort terminate

Do whatever you want. If you're sorting floats, sort them to the front. Every language I've ever used for developing numerical software has a trivial IsNaN equivalent. So that's not a complaint worthy of claiming NaNs are not useful. I've written lots of numerical software and not once has this been an issue for me.

What value do you assign sqrt of a negative without some NaN type item? Or any of tons of other "not a number" results?

>so IEEE does technically make this hold, but only by redefining ==

There's no "redefining ==" here. You are upset that bit patterns are different, but == is not for bit patterns. You are confusing == for floats with == for bit patterns, which are not and need not be the same thing. I've never seen a language that gets these confused. If you want float ==, simply use language ==. If you want bitwise ==, then you usually have to do (often not portable) fiddling to convert to a bit pattern. It's like claiming reference == and structure field == should be the same, but both have uses. So languages have all sorts of ways to use the concept of equality, and they are all useful. Confusing them does not make the ones you don't like invalid or not extremely useful for people that do understand and use them.

>Yes. My day job is in solving Differential Algebraic equations, but I also have written a bunch of Julia's Libm.

Good. Then you should understand why, as an example, C++ std lib has a massive amount of functions like fma, expm1, log1p, hypot, and many more. Sure you can simply write log(1+x) instead of using log1p, but log1p is vastly better in this case because properties of IEEE 754 allow more precision. instead of hypot(x,y) you could write sqrt(xx+yy), but hypot is much better. These functions exist since IEEE provides tools to analyze these and make much better versions than the naive way to write them. Unums, with varying precision, make this vastly harder (and losing precision over the domain, making it hard to analyze anything).

So unums, with varying precision, violate fundamental properties for scientific computing, namely, they lose precision in really messy ways. You cannot start with P digits of precision and do even simple math and get an answer with P digits of precision. IEEE does allow this.

For example, sqrt(x^2)=|x| in IEEE (for no under/overflow). This does not work in unums, since they lose precision. Square something and lose digits. Fundamental to lots of scientific computation is the requirement to maintain precision throughout a calculation. Unums fail this spectacularly, making it incredibly messy to do correct scientific work.

the posit standard has a NaR value that does everything I wish NaN and Inf were in ieee it is the result of 1/0, and sqrt(-1) etc. there is only one of them and it compares equal with itself and is defined as less than all other posits. Real numbers have a total order so it's silly that floating point doesn't. Furthermore the Posit ordering operations (bitwise) are the same as the signed integer ones which makes your processor simpler and makes it easier to do things like write radix sorts for floats.

> You are confusing == for floats with == for bit patterns

The problem is that == for floats doesn't behave like an equality operation. x==x doesn't hold (reflexivity) and x==y => f(x)==f(y) doesn't hold. These are The two most important parts of what equality means.

To take your example of sqrt(xx), for Float16, of the 65k values, 34k give exact answers (counting NaNs as exact otherwise subtract 2k), 16k overflow and 5k underflow. There are also 9k inexact answers of which 6k are within 2 ULPs, and the others are further off (since xx loses precision due to subnormals). so in other words you get exact answers 1/2 of the time and close answers 60% of the time. With Posit16, you get 47k exact answers, and 18k inexact answers. How inexact are these inexact answers? 15k are within 2 ULP and only 2.9k aren't. (Of the 2.9k that aren't, Float16 would have overflowed or underflowed in all but 278 of the cases and these 278 cases are all accurate to less than 4 ULPs).

Posits do lose the ability to do error free transforms, but IMO for 32 bit and smaller math, this isn't a major loss as if you want more accuracy you can use more bits and it will usually be faster than the error free transform.

I've done a similar experiment with log1p(expm1(x)) and for that FLoat16 has 35k exact, 26k overflow, 1.3k within 4 ULP and 3k with more than 4 ULP error. Posits for comparison are 38k exact, 19k within 4 ULP, and 8k more than 4 ULP.
>To take your example of sqrt(xx),

Yes, for small floats posits do ok, but they fail for other sizes. For example, here's float32 vs posit32 for 100,000 random values in ranges 1e2,1e4,..,1e18.

Posit32 fails on (respectively) 21%, 71%, 91%, 97%, 99%, 99.9%, 99.97%, 99.99% of the cases. Float32 fails on 0 of them. Julia code at the bottom.

Posit even fails on simple integer multiplication so often that you'd be terribly pressed to know ahead of time when it happens. For example, take integers 1 to 40 for i and j, multiple as posit16 and as float16, an see how they do. Posit fails 1.75% of the values, float fails none.

This is simple multiplication of numbers well within range. The same problem happens in posit32,64,anysize, but not for the same sized floats.

>These are The two most important parts of what equality means.

As a PhD in math, this is not what equality means. You'll find nothing like that here for example https://en.wikipedia.org/wiki/Equality_(mathematics)

And if you're worried about equality, you might notice that in posit16, 2739 gives 1052 instead of 1053, which is real (in)equality. You worry so much about made up concerns that you miss the crazy bad results scattered throughout posits.

Posits of all sizes make errors when multiplying by powers of two that floats do not make (die to their inability to keep digits). For example, in posit8, 2.01.03125 returns 2.0, 102 returns 16, and examples this bad can be found for any size posit.

To see this, take 1e6 random values in 0-100, mult by 2, then divide by 2, and see how many made it round trip. All float16 values do. 4% of posit16 values do not round trip. These are small numbers - the entire computation stays in the range 0-200, and this is even the base of the underlying number. Posit32 has the same failure rate for the same reason: posits lose precision even under small multiplications.

As a result, posits fail at x-y=0 means x=y, which is also pretty fundamental, is it not?

Want to compute a discriminant sqrt(bb-4ac)? Good luck, nearby values for a,b,c don't give smooth results, and routinely give imaginary numbers when they should be real (due to the above screwiness around powers of 2).

There's so many failure cases, not even at the edge of the ranges, where posits fail and equivalent sized floats don't, that doing any simple computations is error prone.

Here's the Julia code for the sqrt failures. You can do similar error checks for a ton of computations and you'll find posits failing a significant amount of them.

     # count failures of float32 and Posit32 in Julia
     # for sqrt(x*x) ==?= x
     using Random

     Random.seed!(1234) # make reproducible

     scale = 1.0f0 # try exponent 4,6,8,10,etc
     for s in 1:9 # powers 2,4,5,8,10,...18
         scale *= 100.0f0
  
         badF,goodF = 0,0
         badP,goodP = 0,0
         for i in 1:10000
             f = rand()*scale
   
             f1::Float32 = f
             f2 = f1*f1
             f3 = sqrt(f2)

             @assert typeof(f1) == Float32
             @assert typeof(f2) == Float32
             @assert typeof(f3) == Float32

             p1 = Posit32(f)
             p2 = p1*p1
             p3 = sqrt(p2)

             @assert typeof(p1) == Posit32
             @assert typeof(p2) == Posit32
             @assert typeof(p3) == Posit32

             if f1 != f3
                badF+=1
             else
                goodF+=1
             end

             if p1 != p3
                badP+=1
             else
                goodP+=1
             end
          end

          println("Scaling: $(scale)")
          println("float: $(goodF) good, $(badF) bad, $(100*badF/(goodF+badF)) % failed")
          println("posit: $(goodP) good, $(badP) bad, $(100*badP/(goodP+badP)) % failed")
Also to be clear, unums v1 and v2 were (mostly) dumb ideas that haven't gone anywhere. Unums v3 (aka posits) are a (IMO) really good idea for how to generate a better floating point standard (see https://posithub.org/docs/posit_standard-2.pdf)