That's great and all, but nobody needs a 32-bit anything in 2018. This undergraduate paper provides a magic number and associated error bound for 64-bit doubles:
That's not really accurate. Even in cases were 32 bit and 64 bit operations are equally fast on the CPU, 32 bit values still take up half the memory. For many workloads, the limiting factor is cache space. So, if you can use 32 but values, you can get much better performance for those workloads.
And if you’re doing heavy floating point work, you can fit twice as many operations in with a 32-bit float vector as an equally sized double vector, and The vectorized operations happen roughly as fast for both forms, yielding an approximate doubling of speed.
assuming 1) memory bandwidth is the bottleneck and 2) you can keep the tensor values in cache or registers.
I think that GPUs are still vector processing engines, so they should scale with 4x... But assuming google architected the TPU correctly, it should be 16x as fast (I think the architecture is actually that of a rank-2 tensor).
Lots of ML and AI applications are using ever-smaller precisions. Half and even quarter-precision floats are able to maximize efficiency of the various CPU/GPU ALUs.
I was going to mention that... Just because we have ridiculous transistor budgets don't mean there aren't problems where you need/want to push the envelope for performance instead of precision. If anything, it grows the applicable problem space.
Even scientific calculation would be fine with 32 bit floats, but average floating point error due to representation creeps with ON (iirc) over N multiplications, so you have to use 64 bit for many scientific applications to get satisfactory results after a million or a trillion multiplications.
I wanted to put over a billion floats in a numpy array just a few months ago. Making them 16-bit saved a lot of memory.
It doesn't matter how much resource limits increase, people are going to keep hitting them. And when they hit them, using a smaller data type will always help.
That's not relevant there are plenty of single precision float applications today (and many fixed point applications as well). It all depends on your workload.