Hacker News new | ask | show | jobs
by moandcompany 625 days ago
There's a wide range of inference accelerators in commercial use.

For "edge" or embedded applications, an accelerator such as the Google Coral Edge TPU is a useful reference point where it is capable of up to 4 Trillion Operations per Second (4 TOPS), with up to 2 Watts of power consumption (2 TOPS/W), however the accelerator is limited to INT8 operations. It also has around 8 MB of memory for model storage.

Meanwhile a general purpose or gaming GPU can support a wider range of instructions, single-precision, double-precision floating point, integer, etc).

Geforce GTX 1060 for example: 4.375 TFLOPS (FP32) @ 120W (https://www.techpowerup.com/gpu-specs/geforce-gtx-1060-6-gb....)

There are commercial-oriented products that are optimized for particular operations and precision.

Here's a blog post discussing Google's 1st-generation ASIC TPU used in its datacenters: https://cloud.google.com/blog/products/ai-machine-learning/a...

(92 TOPS @ 700 Mhz - 40W)

https://arxiv.org/abs/1704.04760

1 comments

Sorry I’m not familiar with TPUs only GPUs but how much VRAM do Corals have? YOLO 11x is 56M params which if it was quantized to int8 would still be 56MB. Plus you would need some for your inputs.
The Coral Edge TPU has approximately 8MB of SRAM for model weights/parameters.

https://coral.ai/docs/accelerator/datasheet/

It does not have VRAM as it is not a graphics card :)

There are examples and instructions for exporting Yolo variants to run on the Edge TPU: https://docs.ultralytics.com/guides/coral-edge-tpu-on-raspbe...