Y
Hacker News
new
|
ask
|
show
|
jobs
by
sillysaurusx
1086 days ago
At that point it’d be better to do everything in fp32. The hardware can’t do bf16 in the way you’re saying; the conversions would consume all your time.
2 comments
BooneJS
1086 days ago
Compute in F32, but then round and pack a pair of BF16 into 4 bytes.
link
brrrrrm
1085 days ago
The conversions are just a mask and shift? Super cheap
link