| There's a wide range of inference accelerators in commercial use. For "edge" or embedded applications, an accelerator such as the Google Coral Edge TPU is a useful reference point where it is capable of up to 4 Trillion Operations per Second (4 TOPS), with up to 2 Watts of power consumption (2 TOPS/W), however the accelerator is limited to INT8 operations. It also has around 8 MB of memory for model storage. Meanwhile a general purpose or gaming GPU can support a wider range of instructions, single-precision, double-precision floating point, integer, etc). Geforce GTX 1060 for example: 4.375 TFLOPS (FP32) @ 120W
(https://www.techpowerup.com/gpu-specs/geforce-gtx-1060-6-gb....) There are commercial-oriented products that are optimized for particular operations and precision. Here's a blog post discussing Google's 1st-generation ASIC TPU used in its datacenters:
https://cloud.google.com/blog/products/ai-machine-learning/a... (92 TOPS @ 700 Mhz - 40W) https://arxiv.org/abs/1704.04760 |