Hacker News new | ask | show | jobs
by jlgustafson 2540 days ago
At the risk of a "flame war" where there are no winners, I would like to comment on some the statements here before they get stale. If we avoid ad hominem attacks and stick to the math, the claims, and counterexamples, this can be a useful scientific discussion and I very much welcome all the criticism of my ideas.

The irreproducibility of IEEE 754 float calculations is well documented... on Wikipedia, by William Kahan, and in an excellent paper by David Monniaux titled "The pitfalls of floating-point computations". It is amazing that this is tolerated, but IEEE 754 has done a great deal to lower the expectations of computer users regarding mathematically correct behavior.

The posit approach is not merely a format but also the Draft Standard. Whereas floats can arbitrarily use "guard bits" to covertly do calculations with greater accuracy, the posit standard rules that out. Whereas the float standard recommends that math functions like log(x), cos(x) etc. be correctly rounded, the draft posit standard mandates that they be correctly rounded (or else they have to use a function name that clarifies that they are not the correctly-rounded function). By the draft posit standard, you cannot do anything not specified in the source code (like noticing that a multiply and an add could be fused into a multiply-add with deferred rounding, so calling fused multiply-add without telling anyone). The source code completely defines what the result will be, bitwise, or it is not posit-compliant. It cannot depend on internal processor flags, optimization levels, or special hardware with guard bits to improve accuracy; this is what corrupted the IEEE 754 Standard and made it an irreproduci ble environment to this day.

The claim that posits is a "drop-in" replacement for floating point needs a lot of clarification, and this is unfortunately left out of much of the coverage of the ida. Clearly, if an algorithm assigns a hexadecimal value to encode a real value, that will need work to port from IEEE floats to posits. The math libraries need to be rewritten, as well as scanf and printf in C and their equivalent for other languages. However, a number of researchers have found that they can substituted a posit representation for a float representation of the same size, and they get more accurate results with the same number of bits. I call that "plug-and-play" replacement; yes, there are a multitude of side effects that might need to be managed, but it's nothing like the jarring change, say, of moving from serial execution to parallel execution. It's really pretty easy, and it's easy to build tools that catch the 'gotcha' cases.

Some here have suggested the use of rational number representation, or said that there are redundant binary representations of the same numerical value. Unlike floats, posits do not have redundancy. I suspect someone is confused by the Morris approach to adjusting the tradeoff between fraction bits and exponent bits, which produces many redundant wa6s to express the same mathematical value.

Perfect additive associativity is available, as an option, with the quire. If needed. Multiplicative associativity is available, as an option, by calling fused multiply-multiply in the draft posit standard. Because quire operations appear to be both faster (free of renormalization and rounding) and more accurate (exact until converted back to posit form), I am puzzled regarding why anyone would want to do things more slowly and with less accuracy.

Kulisch blazed the way with his exact dot product; unfortunately, any exact dot product based on IEEE floats will have an accumulator with far too many bits (like 4,224 for IEEEE double precision) and an accumulator that is just a bit larger than a power-of-two size. The "quire" of posits is always a power-of-two, much more hardware-friendly. It's 128 bits for 16-bit posits, and 512 bits for 32-bit posits, the width of a cache line on x86, or a an AVX-512 instruction.

"A little knowledge is a dangerous thing." In evaluating posit arithmetic, please use more than what you see in a ycombinator blog. You might discover that there are several decades of careful decision-making behind the design of posit arithmetic. And unlike Kahan, I subject my ideas to critical review by the community and learn from their input. The 1985 IEEE format is grossly overdue for a change.

1 comments

I want to add a few comments as most of the discussions here concerned the hardware implementation and only few pointed to possible applications. I work on weather and climate simulations, but my opinions should apply in general to CFD or PDE-type problems.

Yes, having redundant bitpatterns is not great when designing a number format, however, even for Float16 (half-precision), making use of the 3% NaNs is wise, but not going to be a gamechanger. Some others discussed pro/con for neg zero and also neg infinity: In my view you want to have a bit pattern that tells you that the answer you get is not real, but whether it's +/- Inf or some NaN is pretty much irrelevant. Using these bit patterns for something else sounds like a very reasonable approach to me. Furthermore, I've never come across a good reason for -0 in our applications.

When it comes to weather and climate models in HPC, I see the following potential for posits: Similar as BFloat16 is supported on TPUs, I could see Posit16 to be supported by some specialised hardware like GPUs, FPGAs etc. I'm saying that because for us it's not important to have a whole operating system running in posits (although I probably wouldn't mind) but to have them for some performance critical algorithms. Unfortunately, weather and climate models are far more complex than some dot products and we usually have to deal with a whole zoo of algorithms causing weather and climate models to cover easily several million lines of code. Now let's say we know our model spends 20% of the time in algorithm A which only requires a certain (low) precision to be stable and to yield reasonable results, then it would be indeed a big game changer if we could run this algorithms in, say, 16bit. In exchange of precision for speed we would probably want to push things to the edge, i.e. if we can just about do it in 16bit, then we should. Now there are several 16bit formats: Float16, BFloat16, Posit16, Posit16_2 (with 2 exp bits), and technically also Int16. Let's forget about the technical details of these formats and let's focus on where they actually considerably differ: What is the dynamic range and where on the real axis do I get how much precision to represent numbers. Yes, from a computer science perspective also the technical details matter, but from our perspective most of it is pretty irrelevant and what actual matters are these two things: dynamic range and where is the precision. Because these two really determine whether your algorithm is gonna crash or whether you can use it operationally on your desktop computer or in a big fat $$$ supercomputer.

For PDE-type problems (that includes CFD and also weather and climate models) I came within the last year of my research to the following preliminary conclusions regarding dynamic range and precision with respect to the above mentioned formats:

Int16: Let's forget about it. Float16: The precision is okay, but rarely needed towards the edges of the dynamic range. Floatmin might work, however, floatmax with 65504.0 is easily a killer. Might work with a no-overflow rounding mode and smart rewriting of algorithms to avoid large numbers. BFloat16: For our applications having only 7 significant bits is not enough, I didn't come across a single sophisticated algorithm that works with BFloat16. Posit16 (with 1 exp bit): Great, puts a lot of precision where it's needed but also allows for a reasonable dynamic range. Posit16 (with 2 exp bits): Probably even better, the sacrifice of a bit precision in the middle is fine and the wide dynamic range gives it the potential to also work with algorithms that are hard to squeeze into a smaller dynamic range.

In short, posits actually fit much better the numbers our algorithms produce. And this can indeed be the game changer: If a GPU supports posit arithmetic and we can run algorithm A on it in 16bit: Wonderful, contract sold! But if we couldn't with BFloat16 or Float16 than there is no future for 16bit in our field.

I explain more about this in this paper: dx.doi.org/10.1145/3316279.3316281

And there are two talks which tell a similar story: https://www.youtube.com/watch?v=XazIx0cMVyg https://www.youtube.com/watch?v=wp7AYMWlPLw

or simply drop me an email if you have questions (unlikely respond here) that you find on my website: milank.de