|
|
|
|
|
by Y_Y
809 days ago
|
|
That's not entirely true. Current-gen Nvidia hardware can use fp8 and newly announced Blackwell can do fp4. Lots of existing specialized inference hardware uses int8 and some int4. You're right that low-precison training still doesn't seem to work, presumably because you lose the smoothness required for SGD-type optimization. |
|