Hacker News new | ask | show | jobs
by voz_ 1088 days ago
Yea, I spent a few months comparing the two, and empirically i had a lot more issues with various normalized entropy problems (explosion, not converging, converging slower) with fp16 than with bf16.

The transfer pipeline I wrote for fp32->fp16 also took a lot more work than fp32->bf16