Hacker News new | ask | show | jobs
by dlewis1788 1086 days ago
Someone commented below that with enough batchnorm/layernorm/etc. and/or gradient clipping you can manage it, but BF16 just makes life easier if you can live without some precision.