Hacker News new | ask | show | jobs
by fmajid 928 days ago
Still, being to run LLaMa2 on the NPU would be awesome due to the unified memory. Apple's restricting its use to only Apple-approved models is frankly irksome.
2 comments

The main thing about this framework is, that it uses unified memory with GPU. This gives maximum performance. Neural engine one the other hand is optimized for low-energy inference (which is mostly an advantage on mobile devices), and imposes limitations and restrictions since it's hardware supports only very specific neural network operations. Thus supporting neural engine within a universal machine learning platform doesn't make much sense, it would just be a bottleneck.

The way to use neural engine is to convert existing models that strictly adhere to the limitations of the neural engine hardware (excluding many operations used in non-restricted NN models) for use in energy-restricted inference applications only. It's a different application scenario.

Could Transformer based models been converted to work on the NPU?
Thank you for all this specific information!
> Apple's restricting its use to only Apple-approved models is frankly irksome.

I thought you could run arbitrary networks via CoreML, there's just limited precision and maybe not every operation available?