|
|
|
|
|
by dragonwriter
161 days ago
|
|
Okay, yeah, and those manufacturers’ opinions are both obvious reflections of market position independent of the merits, what do people who actually run inference say? (Also, the NPUs usually aren't any more separate from the GPU than tensor cores are separate from an Nvidia GPU, they are integrated with the CPU and iGPU.) |
|
The general problem with NPUs for memory-limited tasks is either that the throughput available to them is too low to begin with, or that they're usually constrained to formats that will require wasteful padding/dequantizing when read (at least for newer models) whereas a GPU just does that in local registers.