| HN Mirror

I’ve seen this claim a lot, but I’m skeptical. Has anyone actually published benchmarks showing a big speedup from using the NPU for prefill?

AMD’s own marketing numbers say the NPU is about 50 TOPS out of 126 TOPS total compute for the platform. Even if you hand-wave everything else away, that caps the theoretical upside at around ~1.6x.

But that assumes:

1. Your workload maps cleanly onto the NPU’s 8-bit fast path.

2. There’s no overhead coordinating the iGPU + NPU.

My expectation is the real-world gain would be close to 0, but I'd love to be proven wrong!