Hacker News new | ask | show | jobs
by oranlooney 2900 days ago
That's great and all, but nobody needs a 32-bit anything in 2018. This undergraduate paper provides a magic number and associated error bound for 64-bit doubles:

https://cs.uwaterloo.ca/~m32rober/rsqrt.pdf

13 comments

That's not really accurate. Even in cases were 32 bit and 64 bit operations are equally fast on the CPU, 32 bit values still take up half the memory. For many workloads, the limiting factor is cache space. So, if you can use 32 but values, you can get much better performance for those workloads.
And if you’re doing heavy floating point work, you can fit twice as many operations in with a 32-bit float vector as an equally sized double vector, and The vectorized operations happen roughly as fast for both forms, yielding an approximate doubling of speed.
for rank-2 tensor work you can do 4x as many operations, for rank-3 tensor work, it's 8x, assuming memory bandwidth is the bottleneck.
Does that mean it’s 64x as fast for 16-bit floating point vs 64-bit for a rank 3 tensor?
assuming 1) memory bandwidth is the bottleneck and 2) you can keep the tensor values in cache or registers.

I think that GPUs are still vector processing engines, so they should scale with 4x... But assuming google architected the TPU correctly, it should be 16x as fast (I think the architecture is actually that of a rank-2 tensor).

This "nobody needs a 32-bit anything in 2018" seems like a weird opposite of "640K should be enough for anyone".

https://www.wired.com/1997/01/did-gates-really-say-640k-is-e...

Lots of ML and AI applications are using ever-smaller precisions. Half and even quarter-precision floats are able to maximize efficiency of the various CPU/GPU ALUs.
I was going to mention that... Just because we have ridiculous transistor budgets don't mean there aren't problems where you need/want to push the envelope for performance instead of precision. If anything, it grows the applicable problem space.
> That's great and all, but nobody needs a 32-bit anything in 2018.

Then why x86-64 integer instructions default to 32-bit register size when REX prefix byte is not present?

You can double x86 FP throughput using 32-bit floats versus 64 bit ones.

For GPUs, the performance 32-bit float performance advantage can be more than 4-10x (sometimes a lot more).

TIL, no one in the gaming industry uses 32 bit floats any longer. /s
Funny, in 2018 a lot of people are asking for 16-bit floats.

https://en.wikipedia.org/wiki/Half-precision_floating-point_...

This is not true. In games 32-bit floats are extremely common.
Nobody needs absolutes in 2018.
Even scientific calculation would be fine with 32 bit floats, but average floating point error due to representation creeps with ON (iirc) over N multiplications, so you have to use 64 bit for many scientific applications to get satisfactory results after a million or a trillion multiplications.
Not really - https://en.wikipedia.org/wiki/Numerical_stability

If your algorithm is not stable then even 64-bit won't help you.

Compare Euler vs Verlet - https://en.wikipedia.org/wiki/Verlet_integration

You're making a different argument.
Which problem that has stable algorithm would require 64-bit then?
What they typically do in 3d gaming is update the matrix that holds the transformation by a left multiplication, every time the camera changes. So

Tn = U_{n-1} * U_{n-2} * .... * U_0 * T_0

After a while,your matrix accumulates errors, but it's easy to just start and take a fresh one.

> Even scientific calculation would be fine with 32 bit floats

It really depends on the algorithms in question and the error tolerances.

deep learning uses low precision floats

sometimes as few as 8 bits are needed

I think gen 1 or gen 2 of the TPU explicitly supported short ints.
Realtime 3D still uses floats, but only when we can afford something so big, s10e5 is better where available.
I think this article is from 2010.
I wanted to put over a billion floats in a numpy array just a few months ago. Making them 16-bit saved a lot of memory.

It doesn't matter how much resource limits increase, people are going to keep hitting them. And when they hit them, using a smaller data type will always help.

That's not relevant there are plenty of single precision float applications today (and many fixed point applications as well). It all depends on your workload.
> nobody needs a 32-bit anything in 2018

Tell us more about this strange "2018" place!