Hacker News new | ask | show | jobs
by wongarsu 781 days ago
The last couple of years have been a steady journey of us discovering that in most neural networks precision only matters in a couple key places, and everything else can get away with astonishingly little.

We started out training everything in full (f32) or double precision (f64), then around 2020 everyone switched to half precision (f16) with some stuff in full precision, now we are starting to move to quarter precision, and the newest Nvidia card even supports f4 (eighth precision?). And then of course there's the 1.58bit LLM paper.

So there has been a steady stream of people questioning the underlying precision, and most of the time the answer they came back with was: there's more precision than we need, a larger network with less precision is faster and better than a smaller network with more precision

1 comments

To be clear there’s a distinction between the quality of the results and the determinism of the results. If a low-precision LLM is wildly stochastic but the variation is mostly linguistic rather than factual or deductive (e.g. coin tosses on synonyms or presenting independent facts in a different order), then there’s not really a contradiction.

AFAIK the determinism side of floating-point precision hasn’t been well-addressed, but it’s been a while since I skimmed those papers.