Hacker News new | ask | show | jobs
by bleke 2775 days ago
Sorry for not in topic, did Intel calculate bonuses on hn karma (more officially impact)? I see this bf16 multiple times and it like authors dying for Christmas bonus.
1 comments

To me it looks like a clever optimization. Same range as FP32, but half the size and less precise and can be converted back and forth by truncating and concatenating zeros.

Is anyone else using it?

Google uses it on their TPUs [0]. If you're interested in how it would effect the numerical stability of an algorithm you want to use, there is a Julia package that makes prototyping linear algebra over this datatype pretty straightforward [1].

[0] https://cloud.google.com/tpu/docs/system-architecture

[1] https://github.com/JuliaComputing/BFloat16s.jl

And Facebook is taking this even further. And while all these things are very cool, do not let ASIC designers claim they are barriers to entry for GPUs and CPUs. Whatever variants of this precision potpourri catch on are but a generation away from incarnation in general processors IMO...

https://code.fb.com/ai-research/floating-point-math/

Google's TPUs use them. But it has been for a year. I don't agree with the "new" or "Intel's" in the title.
And TPU uses them because Tensorflow uses them, it's been present since the first public commit: https://github.com/tensorflow/tensorflow/blob/f41959ccb2d9d4...
I would be extremely surprised if the motivation for putting bfloat16 in tensorflow was not the TPU. That first public commit was ~1.5 years before TPUv2 was announced at I/O, so it was almost certainly already in development.
bfloat16 was first in DistBelief, so it actually predates TensorFlow and TPUs (I worked on both systems). IIRC the motivation was more about minimizing parameter exchange bandwidth for large-scale CPU clusters rather than minimizing memory bandwidth within accelerators, but the idea generalized.
Thank you! I didn't know this. I thought they introduced them shortly after announcing TPU v1 in the 2016 (or 2017, can't remember) Google I/O.
Why is it clever to change the mantissa and exponent size? I thought the clever ones were the nervana flexpoint which seemed at least partially novel. And it's interesting Intel isn't pushing that format given nervana's asic had it.