| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dmitrygr 1677 days ago
	Contrarian waypoint: beware of not-fast-math. Making things like atan2f and sqrtf set errno takes you down a very slow path, costing you significant perf in cases where you likely do not want it. And most math will work fine with fast-math, if you are careful how you write it. (Free online numerical methods classes are available, eg [1]) Without fast-math most compilers cannot even use FMA instructions (costing you up to 2x in cases where they could be used otherwise) since they cannot prove it will produce the same result - FMA will actually likely produce a more accurate result, but your compiler is handicapped by lack of fast-math to offer it to you. [1] https://ocw.mit.edu/courses/mathematics/18-335j-introduction...

4 comments

mbauman 1677 days ago

That's precisely the part that makes it so impossible to use! Sometimes it means fewer bits of accuracy than IEEE would otherwise give you; sometimes it means more. Sometimes it results in your code being interpreted in a more algebra-ish way, sometimes it's less.

That's why finer-grained flags are needed — yes, FMAs and SIMD are essential for _both_ performance and improved accuracy, but `-ffast-math` bundles so many disparate things together it's impossible to understand what your code does.

> And most math will work fine with fast-math, if you are careful how you write it.

The most hair-pulling part about `-ffast-math` is that it will actively _disable_ your "careful code." You can't check for nans. You can't check for residuals. It'll rearrange those things on your behalf because it's faster that way.

an1sotropy 1677 days ago

(in case anyone reading doesn't know: FMA = Fused Multiply and Add, as in a*b+c, an operation on 3 values, which increases precision by incurring rounding error once instead of twice)

I'm not an expert on this, but for my own code I've been meaning to better understand the discussion here [1], which suggests that there ARE ways of getting FMAs, without the sloppiness of fast-math.

[1] https://stackoverflow.com/questions/15933100/how-to-use-fuse...

simonbyrne 1677 days ago

-ffp-contract=fast will enable FMA contraction, i.e. replacing a * b + c with fma(a,b,c). This is generally okay, but there are a few cases where it can cause problems: the canonical example is computing an expression of the form:

a * d - b * c

If a == b and c == d (and all are finite), then this should give 0 (which is true for strict IEEE 754 math), but if you replace it with an fma then you can get either a positive or negative value, depending on the order in which it was contracted. Issues like this pop up in complex multiplication, or applying the quadratic formula.

a_e_k 1676 days ago

C99 [0] and C++11 [1] both have fma() functions that let you directly request it without the need to mess around with sloppier FP contracts to infer it.

[0] https://en.cppreference.com/w/c/numeric/math/fma

[1] https://en.cppreference.com/w/cpp/numeric/math/fma

StefanKarpinski 1677 days ago

The way Julia handles this is worth noting:

- `fma(a, b, c)` is exact but may be slow: it uses intrinsics if available and falls back to a slow software emulation when they're not

- `muladd(a, b, c)` uses the fastest possibly inexact implementation of `a*b + c` available, which is FMA intrinsics if available or just doing separate `*` and `+` operations if they're not

That gives the user control over what they need—precision or speed. If you're writing code that needs the extra precision, use the `fma` function but if you just want to compute `a*b + c` as fast as possible, with or without extra precision, then use `muladd`.

adgjlsfhk1 1677 days ago

Note that this is only true in theory. In practice, there are still some bugs here that will hopefully be fixed by julia 1.8

dahart 1677 days ago

> which suggests that there ARE ways of getting FMAs, without the sloppiness of fast-math.

There are ways, indeed, but they are pretty slow, it’s prioritizing accuracy over performance. And they’re still pretty tricky too. The most practical alternative for float FMA might be to use doubles, and for double precision FMA might be to bump to a 128 bit representation.

Here’s a paper on what it takes to do FMA emulation: https://www.lri.fr/~melquion/doc/08-tc.pdf

an1sotropy 1677 days ago

I remember a teacher who said (when I was a student) something like "if you care about precision use double". Now that I'm teaching, I force students to only use single-precision "float"s in their code, with the message that FP precision is a finite resource, and you don't learn how to manage any resource by increasing its supply. I think my students hate me.

dahart 1677 days ago

Knowing said teacher ;) I wonder if he’d still say the same thing now… It’s good practice to have to use single precision (or even half-precision!) now and then in order to be forced to deal with precision issues. Yes, use doubles if you really need them and aren’t trying to learn. But they’re often a lot more than 2x more expensive, and they might not be necessary at all. I’ve heard people who develop commercial rendering software for movies you’ve probably seen say out loud that you never need doubles, you just need to understand how to use floats.

couchand 1677 days ago

Perhaps you were in the same lecture as me, when I asked the lead developer on Big Hero 6 why they didn't just use doubles to solve their precision woes, and he informed me that they literally couldn't afford to use doubles at that scale.

dahart 1676 days ago

You know, that is actually ringing a bell, I think I might have indeed. Above I was thinking of someone else who works on a certain renderer made in New Zealand, but it’s true that many studios using doubles either sparingly or not at all. That might be getting even more true as GPUs blend into production…

johncowan 1676 days ago

Memory is a finite resource too, but would you force your students to run all their programs in 12K of memory, just because that was how much memory I had in the machine I learned to program on in 1972?

dahart 1676 days ago

Why not? It’s a professor’s absolute prerogative what lessons they’re offering, and working in low memory is a great lesson to learn. Kids these days are lazy and spoiled with their gigabytes of ram and terabytes of disk. In my day… wait, never mind, I’m starting to sound old, eh?

The flip side question to you is, why should students get away with more than they need? Memory and cycles are wasting energy. We need engineers to understand how to be deeply efficient, not careless with resources. Memory is generally much more expensive than compute cycles in terms of energy use. Yes, please, teach the students how to program with less memory.

Low memory programming is a fantastic exercise for learning modern GPU programming, since you still need to conserve individual bytes when you’re trying to run ten thousand threads at the same time. Or if you’re just into Arduinos.

Other lessons that are great to learn, but take time to appreciate are how to avoid using any dynamic memory, how to avoid recursion, how to avoid function pointers or any of today’s tricky constructs (closures/futures/monads/y-combinators/etc.) I’m of course referring to how some people (like NASA) think of safety critical code https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Dev... But I will add that many of these rules have applied to console video game programming for a long time. They’re easing up lately, but the concepts still apply since coding for a console is effectively embedded programming.

adgjlsfhk1 1676 days ago

One reason is that one of the main resources students should learn to be efficient with is their time. There are definitely places where low memory use is important, but 95% of the time, the first place you should go is to use all the tricks you have to make writing code faster. Knowing how to be careful with precision is great, but so is just using Double (or even BigFloat) to get something that will work robustly without having to analyze as carefully.

an1sotropy 1676 days ago

(setting aside the anachronistic snark) you may have noticed other comments here attesting to how managing 32 bits of FP precision endures today as a relevant skill.

shoo 1677 days ago

each time complaints are raised about single precision, you could deduct 1 bit from the allowance of bits per floating-point value for the next assignment

adgjlsfhk1 1677 days ago

That's not what the parent meant. The parent meant that there are ways of generating fma instructions without using fast-math. Emulating an fma instruction is almost always a bad idea (I should know I've written fma-emulation before. It sucks)

dahart 1677 days ago

Oh, my mistake, thanks. Yes you can use FMA instructions without the fast-math compiler flag for sure. Emulation being a bad idea is the impression I got; I’m glad to hear the confirmation from experience.

simonbyrne 1677 days ago

My point isn't that fast-math isn't useful: it very much is. The problem is that it is a whole grab bag of things that can do very dangerous things. Rather than using a sledgehammer, you should try to be selective and enable only the useful optimizations, e.g. you could just enable -ffp-contract=fast and -fno-math-errno.

djmips 1677 days ago

One thing I don't think you pointed out is that tracking down issues with NaNs seems hard with fast-math since, I believe, it also disables any exceptions that might be useful to being alerted to their formation?

kristofferc 1677 days ago

> Free online numerical methods classes are available

How can you use any numerical methods (like error analysis) if you don't have a solid foundation with strict rules to analyze on top on?