Hacker News new | ask | show | jobs
by eklitzke 1496 days ago
Just to add to this, the reason these inference accelerators have become big recently (see also the "neural core" in Pixel phones) is because they help doing inference tasks in real time (lower model latency) with better power usage than a GPU.

As a concrete example, on a camera you might want to run a facial detector so the camera can automatically adjust its focus when it sees a human face. Or you might want a person detector that can detect the outline of the person in the shot, so that you can blur/change their background in something like a Zoom call. All of these applications are going to work better if you can run your model at, say, 60 HZ instead of 20 HZ. Optimizing hardware to do inference tasks like this as fast as possible with the least possible power usage it pretty different from optimizing for all the things a GPU needs to do, so you might end up with hardware that has both and uses them for different tasks.

1 comments

Thank you @iamaaditya and @eklitzke . Very informative