|
|
|
|
|
by imtringued
817 days ago
|
|
The actual problem is that nobody uses these low precision floats for training their models. When you do quantization you are merely compressing the weights to minimize memory usage and to use memory bandwidth more efficiently. You still have to run the model at the original precision for the calculations so nobody gives a damn about the low precision floats for now. |
|
You're right that low-precison training still doesn't seem to work, presumably because you lose the smoothness required for SGD-type optimization.