|
|
|
|
|
by voz_
1088 days ago
|
|
Yea, I spent a few months comparing the two, and empirically i had a lot more issues with various normalized entropy problems (explosion, not converging, converging slower) with fp16 than with bf16. The transfer pipeline I wrote for fp32->fp16 also took a lot more work than fp32->bf16 |
|