Hacker News new | ask | show | jobs
by omegalulw 1496 days ago
How is the choice between fp16 and fp32 made? Is it like if any gradients in the tensor need the extra range you use fp32?
2 comments

This article [0] from Nvidia gives a good overview of how mixed precision training works.

Super high level (from section 3):

  1. Converting the model to use the float16 data type where possible.
  2. Keeping float32 master weights to accumulate per-iteration weight updates.
  3. Using loss scaling to preserve small gradient values.
[0] https://docs.nvidia.com/deeplearning/performance/mixed-preci...
The PyTorch docs give a pretty good overview of AMP here https://pytorch.org/tutorials/recipes/recipes/amp_recipe.htm... and an overview of which operations cast to which dtype can be found here https://pytorch.org/docs/stable/amp.html#autocast-op-referen....

Edit: Fixed second link.