|
|
|
|
|
by wongarsu
781 days ago
|
|
The last couple of years have been a steady journey of us discovering that in most neural networks precision only matters in a couple key places, and everything else can get away with astonishingly little. We started out training everything in full (f32) or double precision (f64), then around 2020 everyone switched to half precision (f16) with some stuff in full precision, now we are starting to move to quarter precision, and the newest Nvidia card even supports f4 (eighth precision?). And then of course there's the 1.58bit LLM paper. So there has been a steady stream of people questioning the underlying precision, and most of the time the answer they came back with was: there's more precision than we need, a larger network with less precision is faster and better than a smaller network with more precision |
|
AFAIK the determinism side of floating-point precision hasn’t been well-addressed, but it’s been a while since I skimmed those papers.