| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by omegalulw 1496 days ago
	How is the choice between fp16 and fp32 made? Is it like if any gradients in the tensor need the extra range you use fp32?

2 comments

andoma 1496 days ago

This article [0] from Nvidia gives a good overview of how mixed precision training works.

Super high level (from section 3):

  1. Converting the model to use the float16 data type where possible.
  2. Keeping float32 master weights to accumulate per-iteration weight updates.
  3. Using loss scaling to preserve small gradient values.

[0] https://docs.nvidia.com/deeplearning/performance/mixed-preci...

link

h-jones 1496 days ago

The PyTorch docs give a pretty good overview of AMP here https://pytorch.org/tutorials/recipes/recipes/amp_recipe.htm... and an overview of which operations cast to which dtype can be found here https://pytorch.org/docs/stable/amp.html#autocast-op-referen....

Edit: Fixed second link.

link