|
|
|
|
|
by andoma
1492 days ago
|
|
This article [0] from Nvidia gives a good overview of how mixed precision training works. Super high level (from section 3): 1. Converting the model to use the float16 data type where possible.
2. Keeping float32 master weights to accumulate per-iteration weight updates.
3. Using loss scaling to preserve small gradient values.
[0] https://docs.nvidia.com/deeplearning/performance/mixed-preci... |
|