Hacker News new | ask | show | jobs
by orlp 383 days ago
I helped design an API for "algebraic operations" in Rust: <https://github.com/rust-lang/rust/issues/136469>, which are coming along nicely.

These operations are

1. Localized, not a function-wide or program-wide flag.

2. Completely safe, -ffast-math includes assumptions such that there are no NaNs, and violating that is undefined behavior.

So what do these algebraic operations do? Well, one by itself doesn't do much of anything compared to a regular operation. But a sequence of them is allowed to be transformed using optimizations which are algebraically justified, as-if all operations are done using real arithmetic.

4 comments

-ffast-math is actually something like 15 separate flags, and you can use them individually if you want. 3 of them are "no NaNs," "no infinities," and "no subnormals." Several of the other flags allow you to treat math as associative or distributive if you want that.

The library has some merit, but the goal you've stated here is given to you with 5 compiler flags. The benefit of the library is choosing when these apply.

There's probably a benefit to being able to choose which you want in different places though, right?
That sounds neat. What would be really neat is if the language helped to expose the consequences of the ensuing rounding error by automating things that are otherwise clumsy for programmers to do manually, like running twice with opposite rounding directions, or running many many times with internally randomized directions (two of the options in Sec 4 of *). That is, it would be cool if Rust enabled people learn about the subtleties of floating point, instead of hiding them away.

* https://people.eecs.berkeley.edu/~wkahan/Mindless.pdf

Are these calls going to clear the FTZ and DAZ flags in the MXCSR on x86? And FZ & FIZ in the FPCR on ARM?
I don't believe so, no. Currently these operations only set the LLVM flags to allow reassociation, contraction, division replaced by reciprocal multiplication, and the assumption of no signed zeroes.

This can be expanded in the future as LLVM offers more flags that fall within the scope of algebraically motivated optimizations.

Ah sorry I misunderstood and thought this API was for the other way around, i.e. forbidding "unsafe" operations. (I guess the question reverses to setting those flags)

('Naming: "algebraic" is not very descriptive of what this does since the operations themselves are algebraic.' :D)

> ('Naming: "algebraic" is not very descriptive of what this does since the operations themselves are algebraic.' :D)

Okay, the floating point operations are literally algebraic (they form an algebra) but they don't follow some common algebraic properties like associativity. The linked tracking issue itself acknowledges that:

> Naming: "algebraic" is not very descriptive of what this does since the operations themselves are algebraic.

Also this comment https://github.com/rust-lang/rust/issues/136469#issuecomment...

> > On that note I added an unresolved question for naming since algebraic isn't the most clear indicator of what is going on. > > I think it is fairly clear. The operations allow algebraically justified optimizations, as-if the arithmetic was real arithmetic. > > I don't think you're going to find a clearer name, but feel free to provide suggestions. One alternative one might consider is real_add, real_sub, etc.

Then retorted here https://github.com/rust-lang/rust/issues/136469#issuecomment...

> These names suggest that the operations are more accurate than normal, where really they are less accurate. One might misinterpret that these are infinite-precision operations (perhaps with rounding after a whole sequence of operations). > > The actual meaning isn't that these are real number operations, it's quite the opposite: they have best-effort precision with no strict guarantees. > > I find "algebraic" confusing for the same reason. > > How about approximate_add, approximate_sub?

And the next comment

> Saying "approximate" feels imperfect, as while these operations don't promise to produce the exact IEEE result on a per-operation basis, the overall result might well be more accurate algebraically. E.g.: > > (...)

So there's a discussion going on about the naming

It doesn't feel appropriate to comment there for me not knowing any Rust really, but "lax_" (or "relax_") would have the extra benefit of being very short.

(Is this going to overload operators or are people going to have to type this… a lot… ?)

Rust has some precedence for adding convenience newtypes with overloaded operators (eg. `Wrapping<I>´ for `I.wrapping_add(I)` etc). Such a wrapper isn't currently proposed AFAIK but there's no reason one couldn't be added in the future I believe.
WebAssembly also ended up calling its set of similar instructions relaxed.
Does that mean that a physics engine written with these operations will always compile to yield the same deterministic outcomes across different platforms (assuming they correctly implement (or able to do so) algebraic operations)?
It's more like the opposite. These tell the compiler to assume for optimization purposes that floats are associative and so on (ie. algebraic), even when in reality they aren't. So the results may vary depending on what transformations the compiler performs – in particular, they may vary between optimized and non-optimized builds, which normally isn't allowed.
> These tell the compiler to assume for optimization purposes that floats are associative and so on (ie. algebraic), even when in reality they aren't.

I wonder if it is possible to add an additional constraint that guarantees the transformation has equal or fewer numerical rounding errors. E.g. for floating point doubles (0.2 + 0.1) - 0.1 results in 0.20000000000000004, so I would expect that transforming some (A + B) - B to just A would always reduce numerical error. OTOH, it's floating point maths, there's probably some kind of weird gotcha here as well.

Pretty sure that’s not possible. More accurate for some inputs will be less accurate for others. There’s a very tricky tension in float optimization that the most predictable operation structure is a fully skewed op tree, as in naive left-to-right summation, but this is the slowest and least accurate order of operations. Using a more balanced tree is faster and more accurate (great), but unfortunately which tree shape is fastest depends very much on hardware-specific factors like SIMD width (less great). And no tree shape is universally guaranteed to be fully accurate, although a full binary tree tends to have the best accuracy, but has bad base case performance, so the actual shape that tends to get used in high performance kernels is SIMD-width parallel in a loop up to some fixed size like 256 elements, then pairwise recursive reduction above that. The recursive reduction can also be threaded. Anyway, there’s no silver bullet here.
I think a restricted version might be possible to implement: only allow transformations if the transformed version has strictly fewer numerical rounding errors on some inputs. This will usually only mean canceling terms and collecting expressions like "x+x+x" into 3x.

In general, rules that allow fewer transformations are probably easier to understand and use. Trying to optimize everything is where you run into trouble.

Kahan summation is an example (also described in the top level article) of one such “gotcha”. It involves adding a term that - if floats were algebraic in this sense - would always be zero, so ffast-math often deletes it, but this actually completely removes the accuracy improvement of the algorithm
Under EForth with FP done in software:

2 f 1 f 1 f f+ f- f. 0.000 ok

PFE, I think reusing the GLIBC math library:

2e0 1e0 1e0 f+ f- f. 0.000000 ok

Every implementation of floating point can handle small to medium integers without rounding. So that example doesn't show anything we can learn from.
Buth for Forth you can have a totally working software floating point made from the people whose later would set the IEEE standard. Think about very small microcontrollers without FP in hardware, you can be sure that a software FP implementation will perfectly work under the standard thresolds.
No, there is no guarantee which (if any) optimizations are applied, only that they may be applied. For example a fused multiply-add instruction may be emitted for a*b + c on platforms which support it, which is not cross-platform.
No, the result may depend on how the compiler reorders them, which could be different on different platforms.