Hacker News new | ask | show | jobs
by dlewis1788 1086 days ago
My understanding is for certain types of networks BF16 will train better than FP16, given the additional protection against exploding gradients and loss functions with the extended range of BF16 - at the loss of precision.