| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dhruvdh 2472 days ago
	It's very easy to do FMA's using .mul_add() on floats in Rust, which the author didn't seem to know about.

2 comments

magicalhippo 2472 days ago

Ideally the compiler should be able to do this by itself though, at least with the appropriate flag to enable it.

link

paulddraper 2472 days ago

FMA isn't a safe optimization as it can give different results.

C++ compilers have flags to enable it globally. gcc and clang include the optimization in -Ofast.

Rust allows you to choose at a code level (but usually people don't know about it). Perhaps it should also have a global fast-math flag that would automatically optimize it. Pros and cons to that.

link

nwallin 2471 days ago

FMA is "safe" in that if it breaks your code, it was already broken. It can only make the results slightly more accurate, unlike for instance the rsqrt instruction which is less accurate. (and as such is not a safe optimization)

GCC emits FMA instructions at -O2 without -ffast-math.

link

lilyball 2471 days ago

Well, it could "break" your code in that it might make your code produce different results than a separate equivalent implementation that didn't use FMA.

Edit: Ok actually it sounds like it could literally break some algorithms, see https://news.ycombinator.com/item?id=21342974

link

magicalhippo 2471 days ago

I wasn't trying to imply it should be on by default. Often one does not care about the lower bits of the floats, but do want the speed. For some tasks it's very much the opposite. Being able to specify a global option with local override is a great combo.

link

mratsim 2471 days ago

FMA is not always faster, it has a high latency: 5-6 cycles depending on the CPU while Add and MUL have very low-latency.

This means that to fully utilizes FMA you need to unroll a loop more. Sometimes yyou just can't, and the other time you use more instructions, use more cache.

In short it's not always better.

Also as other said, FMA has better accuracy than separate Add + Mul

link

jacobolus 2472 days ago

If you’re doing fiddly numerical work, this must definitely be optional, as swapping separate multiplication and addition for FMA (or vice versa) can compromise correctness. In some cases you need two different algorithms if FMA is present or absent.

link

fluffything 2472 days ago

Do you have concrete examples of such algorithms?

link

jacobolus 2472 days ago

Some algorithms guarantee that some arithmetic operation(s) applied to 2+ floating point inputs will result in a list of floating point outputs which when summed have exactly the correct result. This gets all screwed up if you mess with the order of operations or the rounding of intermediate results.

e.g. https://www.cs.cmu.edu/~quake/robust.html

Some keywords to look for: “compensated arithmetic”, “error-free transformations”.

FMAs generally speed up these tools, but you need to be careful and deliberate about how they are used.

(Disclaimer: I am not an expert on this, just some guy on the internet.)

link

fluffything 2472 days ago

Fair, I think it would be very helpful for Rust if some expert actually knows of a specific example for which this is the case. I think there is an RFC about enabling floating-point contraction by default, that would "silently" be able to do some of these transformations depending on the optimization level.

link

nestorD 2472 days ago

The one very important thing that often get destroyed by compiler using associativity is the TwoSum Error free transform which is a vital composant of several algorithms that deal with numerical error (most notably the Kahan Summation).

The problem is mentionned in the Wikipedia page of the Kahan summation (and I have been able to reproduce it with gcc) : https://en.wikipedia.org/wiki/Kahan_summation_algorithm#Poss...

This is actually my area of research, I could contribute if you point me to an RFC.

link

BubRoss 2471 days ago

It isn't a matter of being a rust expert, fused multiply add is an instruction on CPUs.

link

ibotty 2472 days ago

Care to rewrite the program with `.mul_add()`?

link

lovasoa 2472 days ago

I did rewrite the code with mul_add, and didn't see any significant performance improvement. See my comment above.

link

ibotty 2472 days ago

very appreciated! Thank you.

link