Hacker News new | ask | show | jobs
by andoma 1492 days ago
This article [0] from Nvidia gives a good overview of how mixed precision training works.

Super high level (from section 3):

  1. Converting the model to use the float16 data type where possible.
  2. Keeping float32 master weights to accumulate per-iteration weight updates.
  3. Using loss scaling to preserve small gradient values.
[0] https://docs.nvidia.com/deeplearning/performance/mixed-preci...