Hacker News new | ask | show | jobs
by alecco 827 days ago
Yes, but it's a bit more complicated. There are 2 FP8 formats: E5M2 and E4M3.

E5M2 is like an IEEE 754. But to compensate the smaller exponent, "E4M3’s dynamic range is extended by not representing infinities and having only one mantissa bit-pattern for NaNs".

Some people reported E4M3 is better for the forward pass (small range, more precision) and E5M2 is better for the backward pass (bigger range, less precision). And most implementations have some sort of scaling or other math tricks to shrink the error.

[0] FP8 Formats for Deep Learning (Nvidia/ARM/Intel) https://arxiv.org/abs/2209.05433

1 comments

Fair points! Ye Pytorch's fp8 experimental support does scaling of the gradients. Interesting point on a larger range for the forward pass, and a small range for the gradients! I did not know that - so learnt something today!! Thanks! I'll definitely read that paper!