Practically speaking, most models today infer at 8bit or 16bit (sometimes, rarely 32). You don't see an empirical lift at more bits of precision. Size of the memory is far more important.
Sometimes gradients are small but meaningful, if you constrain them to too few bits / degrees of freedom they'll be unable to backprop successfully. This can hamper training and therefore results quality.
You can also think about it as compounding errors - at any one weight index the bit values might not be too meaningful, but cascaded over a lot of tensor multiplications they will be.