|
|
|
|
|
by alevskaya
1354 days ago
|
|
There's a big difference between not caring about stability, and being willing to trade precision for better memory bandwidth for an application that doesn't benefit from increased precision. When doing large training jobs on TPUs, stability is paramount! It's true that you have to know more about what you're doing when you reduce bit-depth - the horrors of floating point are harder to ignore, and it's wildly inappropriate for many scientific computations. However the reduction of bit-depth is likely to continue as we seek to make modern models more efficient and economical to train and use. |
|