|
|
|
|
|
by sabhiram
1078 days ago
|
|
Fascinating paper. We design an inference accelerator which more or less accomplishes this by quantizing input tensors into logarithmic space. This allows the multiplication (in convolution especially), to be optimized into very simple adders. This (and a few other tricks) has a very dramatic impact on how much compute density we achieve while keeping power very low. We keep the tensors in our quantized space throughout the layers of the network and convert the outputs as required on the way out of the ASIC. We achieve impressive task level performance, but this requires some specialized training and model optimizations. Very cool to see ideas like this propagate more into the mainstream. |
|