Yes, "floating point accumulation doesn't commute" is a mantra everyone should have in their head, and when I first read this article, I was jumping at the bit to dismiss it out of hand for that reason.
But, what got me about this is that:
* every other Apple device delivered the same results
* Apple's own LLM silently failed on this device
to me that behavior suggests an unexpected failure rather than a fundamental issue; it seems Bad (TM) that Apple would ship devices where their own LLM didn't work.
I guess at the bit level, but not at the level of computation? Anything that relies on bit patterns of nans behaving in a certain way (like how they propagate) is in dangerous territory.
> Anything that relies on bit patterns of nans behaving in a certain way (like how they propagate) is in dangerous territory.
Why? This is well specified by IEEE 754. Many runtimes (e.g. for Javascript) use NaN boxing. Treating floats as a semi-arbitrary selection of rational numbers plus a handful of special values is /more/ correct than treating them as real numbers, but treating them as actually specified does give more flexibility and power.
I would go even further and state that "you should never assume that floating point functions will evaluate the same on two different computers, or even on two different versions of the same application", as the results of floating point evaluations can differ depending on platform, compiler optimizations, compilation-flags, run-time FPU environment (rounding mode, &c.), and even memory alignment of run-time data.
There's a C++26 paper about compile time math optimizations with a good overview and discussion about some of these issues [P1383]. The paper explicitly states:
1. It is acceptable for evaluation of mathematical functions to differ between translation time and runtime.
2. It is acceptable for constant evaluation of mathematical functions to differ between platforms.
So C++ has very much accepted the fact that floating point functions should not be presumed to give identical results in all circumstances.
Now, it is of course possible to ensure that floating point-related functions give identical results on all your target machines, but it's usually not worth the hassle.
hey, I appreciate your love of language and sharing with us.
I'm wondering if we couldn't re-think "bit" to the computer science usage instead of the thing that goes in the horse's mouth, and what it would mean for an AI agent to "champ at the bit"?
Actually it was originally "champing" – to grind or gnash teeth. The "chomping" (to bite) alternative cropped up more recently as people misheard and misunderstood, but it's generally accepted as an alternative now.
Do you have a source on this, or a definition for what it means to be "primary" here? All I can find is sources confirming that "champing" is the original and more technically correct, but that "chomping" is an accepted variant.
But, what got me about this is that:
* every other Apple device delivered the same results
* Apple's own LLM silently failed on this device
to me that behavior suggests an unexpected failure rather than a fundamental issue; it seems Bad (TM) that Apple would ship devices where their own LLM didn't work.