|
From what I understand, using (-1, 0, 1) removes multiplications in GPUs. Ie assume you have a weight matrix and multiply it by some activations [-1, 0, 1]
[0, 1, -1]
[10, 20, 30] x [1, 1, 0]
Instead of doing 10(-1) + 20(0) + 30(1) + 10(0) + ..., since we know beforehand the weights are simply (-1, 0, 1), we easily flip the sign and do addition, or force the hardware to do addition ie if (-1) do subtraction. If (0) do addition. If (1) do addition.Floating point multiplication does addition of the exponents and multiplying of the mantissa. So just simplifying: Float16 has E=5, M=10. Ie around 5 + 10^2 space needed = 105. Bfloat16 has E=8, M=7. So 8 + 7^2 = 57 space. Float8(143) E=4, M=3. So 4 + 3^2 = 13 space. 1.58(16bit) E=5, M=10. Addition only, so shift E say 5 + 10 addition = 15. 1.58(8bit) E=4, M=3. Addition only, so shift E say 4 + 3 addition = 7. Obviously I'm simplifying, but with only additions, 1.58 uses say 7 space, whilst FP8 uses 13 space, so in theory 2x more transistors can be crammed, ie 2x more FLOPs than FP8. |