Hacker News new | ask | show | jobs
by dragonwriter 161 days ago
Okay, yeah, and those manufacturers’ opinions are both obvious reflections of market position independent of the merits, what do people who actually run inference say?

(Also, the NPUs usually aren't any more separate from the GPU than tensor cores are separate from an Nvidia GPU, they are integrated with the CPU and iGPU.)

2 comments

If you're running an LLM there's a benefit in shifting prompt pre-processing to the NPU. More generally, anything that's memory-throughput limited should stay on the GPU, while the NPU can aid compute-limited tasks to at least some extent.

The general problem with NPUs for memory-limited tasks is either that the throughput available to them is too low to begin with, or that they're usually constrained to formats that will require wasteful padding/dequantizing when read (at least for newer models) whereas a GPU just does that in local registers.

Depends on how big the NPU is and how much power/memory the inference model needs.