| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Affric 383 days ago
	How does IEEE 754 prevent auto-vectorisation?

4 comments

dahart 382 days ago

The spec doesn’t prevent auto-vectorization, it only says the language should avoid it when it wants to opt in to producing “reproducible floating-point results” (section 11 of IEEE 754-2019). Vectorizing can be implemented in different ways, so whether a language avoids vectorizing in order to opt in to reproducible results is implementation dependent. It also depends on whether there is an option to not vectorize. If a language only had auto-vectorization, and the vectorization result was deterministic and reproducible, and if the language offered no serial mode, this could adhere to the IEEE spec. But since C++ (for example) offers serial reductions in debug & non-optimized code, and it wants to offer reproducible results, then it has to be careful about vectorizing without the user’s explicit consent.

kzrdude 383 days ago

If you write a loop `for x in array { sum += x }` Then your program is a specification that you want to add the elements in exactly that order, one by one. Vectorization would change the order.

dahart 382 days ago

The bigger problem there is the language not offering a way to signal the author’s intent. If an author doesn’t care about the order of operations in a sum, they will still write the exact same code as the author who does care. This is a failure of the language to be expressive enough, and doesn’t reflect on the IEEE spec. (The spec even does suggest that languages should offer and define these sorts of semantics.) Whether the program is specifying an order of operations is lost when the language offers no way for a coder to distinguish between caring about order and not caring. This is especially difficult since the vast majority of people don’t care and don’t consider their own code to be a specification on order of operations. Worse, most people would even be surprised and/or annoyed if the compiler didn’t do certain simplifications and constant folding, which change the results. The few cases where people do care about order can be extremely important, but they are rare nonetheless.

stingraycharles 383 days ago

Yup, because of the imprecision of floating points, cannot just assume that “(a + c) + (b + d)” is the same as “a + b + c + d”.

It would be pretty ironic if at some point fixed point / bignum implementations end up being faster because of this.

anthk 382 days ago

They are, just check anything fixed-point for the 486SX vs anything floating under a 486DX. It's faster scaling and sum and print the desired precision than operating on floats.

stingraycharles 379 days ago

Is that also the case for modern architectures? Eg is there SIMD fixed precision?

einpoklum 382 days ago

I wonder... couldn't there just be some library type for this, e.g. `associative::float` and `associative::doube` and such (in C++ terms), so that compilers can ignore non-associativity for actions on values of these types? Or attributes one can place on variables to force assumption of associativity?

Kubuxu 383 days ago

IIRC reordering additions can cause the result to change which makes auto-vectorisation tricky.

goalieca 382 days ago

Floating point arithmetic is neither commutative or associative so you shouldn’t.

lo0dot0 382 days ago

While it technically correct to say this it also gets the wrong point across because it leaves out the fact that ordering changes create only a small difference. Other examples where arithmetic is not commutative, e.g. matrix multiplication , can create much larger differences.

kstrauser 382 days ago

> ordering changes create only a small difference.

That can’t be assumed.

You can easily fall into a situation like:

  total = large_float_value
  for _ in range(1_000_000_000):
    total += .01
  assert total == large_float_value

Without knowing the specific situation, it’s impossible to say whether that’s a tolerably small difference.

StefanKarpinski 380 days ago

Floating-point arithmetic is non-associative, but it is commutative for the operations that are algebraically commutative: x + y == y + x and x*y == y*x. And x - y = -(y - x) so subtraction is properly anti-commutative.

The only very marginal exception to this is that when both arguments are NaN, the return value will be NaN, but which NaN payload is returned can depend on argument order. But no one ever uses this because it's not specified, so it can't be used reliably for anything useful. The behavior I wish IEEE 754 had specified for this is to define a standard NaN value (or two), and when the return value of an op is NaN, and some of the arguments are non-standard NaNs, then one of those non-standard NaN values must be returned. This doesn't depend on argument order and allows NaN payloads to be reliably propagated, which would let you encode useful debugging information in NaN payloads and know that it will flow through the program.

layer8 382 days ago

IEEE-754 addition and multiplication is commutative. It isn't distributive, though.

eapriv 382 days ago

Why is it not commutative?

layer8 382 days ago

It actually is commutative according to IEEE-754, except that in the case of a NaN result you might get a different NaN representation.

adgjlsfhk1 382 days ago

having multiple NaNs and no spec for how they should behave feels like such an unforced error to me

layer8 382 days ago

For mathematical use, NaN payloads shouldn’t matter, and behave identically (aside from quiet vs. signaling NaNs). It also doesn’t matter for equality comparison, because NaNs always compare unequal.

adgjlsfhk1 381 days ago

from the user perspective it's not too bad, but from the compiler perspective it is. The result of this is that LLVM has decided that trying to figure out which nan you got (e.g. by casting to an Int and comparing) is UB, which means pretty much every floating point operation becomes non-deterministic.

This also adds extra complexity to the CPU. you need special hardware for == rather than just using the perfectly good integer unit, and every fpu operation needs to devote a bunch of transistors to handling this nonsense that buys the user absolutely nothing.

there are definitely things to criticize about the design of Posits, but the thing they 100% get right is having a single NaN and sane ordering semantics