| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by andrepd 31 days ago

Some of the complexity that comes with this really comes from the complexity of IEEE734 itself, plus the fragmentation of alternatives at lower precision.

I would have loved if the article mentioned the efforts at integrating Posits [0] in risc-v. While IEEE734 compatibility will obviously be necessary for any foreseeable future, it would be nice if the industry could settle on a better alternative which avoids many of the flaws with IEEE floats.

[0] https://github.com/andrepd/posit-rust

2 comments

jcranmer 31 days ago

The takes I've seen from pretty much all the numerical analysts is that posits are not really viable as a replacement for floating-point; they're just a much worse implementation. There are a lot of flaws with IEEE 754 (I'm upset that NaN != NaN, e.g., and sNaNs seem to be universally considered a design mistake), but the underlying theory behind it is very well worked-out, and even its more controversial features (like denormals) turn out to have been vital innovations.

One of the issues with posits from a numerical perspective is that it's main claim to fame of being more accurate is handled by having the number of precision bits being dependent on the exponent value, which means it's a return to the days of having to scale your input so that the values are around unity to maximize, rather than the promise of floating-point of being scale-independent.

link

andrepd 31 days ago

> The takes I've seen from pretty much all the numerical analysts is that posits are not really viable as a replacement for floating-point; they're just a much worse implementation

Care to elaborate? All the analyses I've seen show that posits achieve generally higher precision at the same number of bits compared to IEEE floats. You don't even necessarily need to care about scaling your inputs to be close to ±1; most problem domains already naturally have this!

The "quire" accumulator also enables new kinds of algorithms that are just not possible with IEEE (technically one can have a fixed-point accumulator for IEEE floats, but it's not as practical).

link

jcranmer 31 days ago

See e.g. https://people.eecs.berkeley.edu/~demmel/ma221_Fall20/Dinech...

link

SideQuark 31 days ago

> which avoids many of the flaws with IEEE floats

... by repeating lots of the flaws that led to IEEE754, requiring extra accumulators (the "quire") and hardware to do basic ops since the posit format alone fails, and making numerical analysis a complete mess, breaking the ability to write correct numerical algorithms.

They lose precision over large dynamic ranges, making algorithms fail on many inputs, without extreme care (and loss of accuracy over such ranges), lack of NaN/inf makes them fail on lots of other issues (and there are algorithms requiring NaN and inf behavior under IEEE754 for performance - I'll list one I recently made below...), this lack makes it harder to debug where algorithms broke, costing development time, ....

The algo I recently developed needed to find extrema of cubics over a finite range. This requires solving a quadratic. A quadratic root solver can have /0 = inf and sqrt(-) = NaN cases, which are often fiddled with using branches.

In my case I knew I'd be doing these in batches, and wanted C/C++ code to auto vectorize and do them in SIMD, and did not want to pay the cost for branches. This speed up the flow by about 8x on almost all larger processors, at the cost of some slots having NaN or inf. Those with NaN or inf had underlying cubics I could discard. So by using the IEEE754 aware multi parallel root finder (written in strd c++), I could check that the roots were in my interval (also parallelized) as a <= root <= b, which fails for root being NaN or inf. This check is also parallelzied.

All in standard C++, no hint of parallelization intrinsics, handled by modern compilers perfectly, and getting massive speed gains.

This is but one place NaN and inf are extremely useful. This type of use appears all over in scientific computing, graphics, pysics sims, etc.

Posits cannot handle this type of stuff.

link

andrepd 31 days ago

To be honest I'm a bit confused by your comments but the parts I can understand show some misunderstandings.

> lack of NaN/inf makes them fail on lots of other issues

You comment repeats several times assertions of the form "fails on lots of stuff", but does not really elaborate further x) But that aside.

Posits do have a NaN value, with the difference being that only 1 bit pattern is reserved to it instead of potentially quadrillions of them. This makes a huge difference especially at lower precisions, where NaN bit patterns waste an appreciable amount of the total set of values. They do not have inf, they simply do not overflow.

> In my case I knew I'd be doing these in batches, and wanted C/C++ code to auto vectorize and do them in SIMD, and did not want to pay the cost for branches. This speed up the flow by about 8x on almost all larger processors, at the cost of some slots having NaN or inf. Those with NaN or inf had underlying cubics I could discard.

Can you point to your code? I'd be very very surprised if it somehow cannot be rendered using posits.

Pathological cases do exist, of course. Nobody claims that posits are uniformly better than IEEE floats, such a claim being obviously, trivially nonsensical.

> All in standard C++, no hint of parallelization intrinsics, handled by modern compilers perfectly, and getting massive speed gains

Speed gains over what? x) I don't get it. Over branchy or scalar IEEE code? If yes, what's the relevance of that?

And yes, a 40-year old format with decades of hardware and compiler support works better in practice than any conceivable alternative... because all such alternatives are proof-of-concepts with no hardware or compiler support! There's no commercial hardware of any kind for posits, let alone SIMD. I really fail to see your point here.

link

dadoum 31 days ago

Correct me if I am wrong, but your algorithm looks to me like it would work fine with posits? You would just get a NaR value instead of both NaN and inf, because div-by-zero and sqrt(-) both yield NaR (the difference here with floats is that it is unique and can get compared to NaR), and so it just works fine?

link