Hacker News new | ask | show | jobs
by domschl 929 days ago
The main thing about this framework is, that it uses unified memory with GPU. This gives maximum performance. Neural engine one the other hand is optimized for low-energy inference (which is mostly an advantage on mobile devices), and imposes limitations and restrictions since it's hardware supports only very specific neural network operations. Thus supporting neural engine within a universal machine learning platform doesn't make much sense, it would just be a bottleneck.

The way to use neural engine is to convert existing models that strictly adhere to the limitations of the neural engine hardware (excluding many operations used in non-restricted NN models) for use in energy-restricted inference applications only. It's a different application scenario.

2 comments

Could Transformer based models been converted to work on the NPU?
Thank you for all this specific information!