|
|
|
|
|
by kevingadd
2672 days ago
|
|
Unless the paper specifically calls it out in a spot I didn't see, it's not necessarily the case that the DNN operations are floating-point. Some networks use FP16 or FP32 (it's my understanding that this is very common during training) but actual production use of a trained network can happen using int8 or int4. You can see this if you look at what the 'Tensor' cores in modern geforce cards expose support for and what Google's latest cloud tensor cores support. NV's latest cores expose small matrices of FP16, INT8 and INT4 (I've seen some suggestions that they do FP32 as well but it's not clear whether this is accurate), while Google's expose huge matrices in different formats (TPUv1 was apparently INT8, TPUv2 appears to be a mix of FP16 and FP32). In non-DNN image processing it's quite common to use ints as well (iDCT, FFT, etc) for the potential performance gains vs. floating point. |
|