From what I can tell, NPUs are mostly being used by Microsoft to encourage vendor lock-in to the MicrosoftML/ONNX platform (similar to their DirectX playbook).
They're used a lot on mobile. Apple uses their "neural engine" NPU to power their on-device ML stuff and Samsung does something similar in their Exynos processors. Apple also exposes the NPU to developers via CoreML.
Failure to understand why developers chose CUDA, is exactly why NVidia keeps selling.
Same applies to proprietary 3D APIs.
There is a reason why only FOSS devs make such big fuss out of APIs, while professional game studios keep talk about how to take each hardware to its limits at GDC, since 8 bit heterogeneous home game systems.