Hacker News new | ask | show | jobs
by zozbot234 239 days ago
Wrt. language models/transformers, the neural engine/NPU is still potentially useful for the pre-processing step, which is generally compute-limited. For token generation you need memory bandwidth so GPU compute with neural/tensor accelerators is preferable.
1 comments

I think I'd still rather have the hardware area put into tensor cores for the GPU instead of this unit that's only programmable with onnx.