Hacker News new | ask | show | jobs
by nyrikki 96 days ago
> Can't they run an example prompt and verify they get the exact same output token probabilities for all prompts?

You don’t even get that with GPUs in general, or really floating point in general.

The Art of Computer Programming. Volume 2: Seminumerical Algorithms section 4.2.2 with explain where it loses floating addition associativity property.

Apartness relations are another possible lens.

1 comments

> However, as the name “batch-invariant” suggests, the technique is currently limited to handling variations related only to the batch dimension, making it robust to continuous batching and other batch-size–related changes, but not to other forms of nondeterminism like changing the TP sizes or GPU types.

https://arxiv.org/abs/2506.09501