Hacker News new | ask | show | jobs
by raincole 134 days ago
Low level numerical operation optimizations are often not reproduceable. For example: https://www.intel.com/content/dam/develop/external/us/en/doc... (2013)

But it's still surprising that that LLM doesn't work on iPhone 16 at all. After all LLMs are known for their tolerance to quantization.

1 comments

Yes, "floating point accumulation doesn't commute" is a mantra everyone should have in their head, and when I first read this article, I was jumping at the bit to dismiss it out of hand for that reason.

But, what got me about this is that:

* every other Apple device delivered the same results

* Apple's own LLM silently failed on this device

to me that behavior suggests an unexpected failure rather than a fundamental issue; it seems Bad (TM) that Apple would ship devices where their own LLM didn't work.

> floating point accumulation doesn't commute

It is commutative (except for NaN). It isn't associative though.

I think it commutes even when one or both inputs are NaN? The output is always NaN.
NaNs are distinguishable. /Which/ NaN you get doesn't commute.
I guess at the bit level, but not at the level of computation? Anything that relies on bit patterns of nans behaving in a certain way (like how they propagate) is in dangerous territory.
> Anything that relies on bit patterns of nans behaving in a certain way (like how they propagate) is in dangerous territory.

Why? This is well specified by IEEE 754. Many runtimes (e.g. for Javascript) use NaN boxing. Treating floats as a semi-arbitrary selection of rational numbers plus a handful of special values is /more/ correct than treating them as real numbers, but treating them as actually specified does give more flexibility and power.

Unless you compile with fast-math ofc, because then the compiler will assume that NaN never occurs in the program.
I would go even further and state that "you should never assume that floating point functions will evaluate the same on two different computers, or even on two different versions of the same application", as the results of floating point evaluations can differ depending on platform, compiler optimizations, compilation-flags, run-time FPU environment (rounding mode, &c.), and even memory alignment of run-time data.

There's a C++26 paper about compile time math optimizations with a good overview and discussion about some of these issues [P1383]. The paper explicitly states:

1. It is acceptable for evaluation of mathematical functions to differ between translation time and runtime.

2. It is acceptable for constant evaluation of mathematical functions to differ between platforms.

So C++ has very much accepted the fact that floating point functions should not be presumed to give identical results in all circumstances.

Now, it is of course possible to ensure that floating point-related functions give identical results on all your target machines, but it's usually not worth the hassle.

[P1383]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p13...

Even the exact same source code compiled with different compilers, or the same compiler with different compiler options.

Intel Compiler for e.g. uses less than IEEE764 precision for floating point ops by default, for example.

FYI, the saying is "champing at the bit", it comes from horses being restrained.
Huh. I never knew "champing" was the proper spelling [0]

[0] https://www.npr.org/sections/memmos/2016/06/09/605796769/che...

hey, I appreciate your love of language and sharing with us.

I'm wondering if we couldn't re-think "bit" to the computer science usage instead of the thing that goes in the horse's mouth, and what it would mean for an AI agent to "champ at the bit"?

What new sayings will we want?

Byting at the bit?
chomping at the bit
Actually it was originally "champing" – to grind or gnash teeth. The "chomping" (to bite) alternative cropped up more recently as people misheard and misunderstood, but it's generally accepted as an alternative now.
I see
It’s actually accepted as the primary now and telling people about “champing” is just seen as archaic.
Do you have a source on this, or a definition for what it means to be "primary" here? All I can find is sources confirming that "champing" is the original and more technically correct, but that "chomping" is an accepted variant.
As a sister comment said, floating point computations are commutative, but not associative.

a * b = b * a for all "normal" floating point numbers.