Hacker News new | ask | show | jobs
by dhruvdh 2424 days ago
It's very easy to do FMA's using .mul_add() on floats in Rust, which the author didn't seem to know about.
2 comments

Ideally the compiler should be able to do this by itself though, at least with the appropriate flag to enable it.
FMA isn't a safe optimization as it can give different results.

C++ compilers have flags to enable it globally. gcc and clang include the optimization in -Ofast.

Rust allows you to choose at a code level (but usually people don't know about it). Perhaps it should also have a global fast-math flag that would automatically optimize it. Pros and cons to that.

FMA is "safe" in that if it breaks your code, it was already broken. It can only make the results slightly more accurate, unlike for instance the rsqrt instruction which is less accurate. (and as such is not a safe optimization)

GCC emits FMA instructions at -O2 without -ffast-math.

Well, it could "break" your code in that it might make your code produce different results than a separate equivalent implementation that didn't use FMA.

Edit: Ok actually it sounds like it could literally break some algorithms, see https://news.ycombinator.com/item?id=21342974

I wasn't trying to imply it should be on by default. Often one does not care about the lower bits of the floats, but do want the speed. For some tasks it's very much the opposite. Being able to specify a global option with local override is a great combo.
FMA is not always faster, it has a high latency: 5-6 cycles depending on the CPU while Add and MUL have very low-latency.

This means that to fully utilizes FMA you need to unroll a loop more. Sometimes yyou just can't, and the other time you use more instructions, use more cache.

In short it's not always better.

Also as other said, FMA has better accuracy than separate Add + Mul

If you’re doing fiddly numerical work, this must definitely be optional, as swapping separate multiplication and addition for FMA (or vice versa) can compromise correctness. In some cases you need two different algorithms if FMA is present or absent.
Do you have concrete examples of such algorithms?
Some algorithms guarantee that some arithmetic operation(s) applied to 2+ floating point inputs will result in a list of floating point outputs which when summed have exactly the correct result. This gets all screwed up if you mess with the order of operations or the rounding of intermediate results.

e.g. https://www.cs.cmu.edu/~quake/robust.html

Some keywords to look for: “compensated arithmetic”, “error-free transformations”.

FMAs generally speed up these tools, but you need to be careful and deliberate about how they are used.

(Disclaimer: I am not an expert on this, just some guy on the internet.)

Fair, I think it would be very helpful for Rust if some expert actually knows of a specific example for which this is the case. I think there is an RFC about enabling floating-point contraction by default, that would "silently" be able to do some of these transformations depending on the optimization level.
The one very important thing that often get destroyed by compiler using associativity is the TwoSum Error free transform which is a vital composant of several algorithms that deal with numerical error (most notably the Kahan Summation).

The problem is mentionned in the Wikipedia page of the Kahan summation (and I have been able to reproduce it with gcc) : https://en.wikipedia.org/wiki/Kahan_summation_algorithm#Poss...

This is actually my area of research, I could contribute if you point me to an RFC.

It isn't a matter of being a rust expert, fused multiply add is an instruction on CPUs.
Care to rewrite the program with `.mul_add()`?
I did rewrite the code with mul_add, and didn't see any significant performance improvement. See my comment above.
very appreciated! Thank you.