|
|
|
|
|
by aldonius
462 days ago
|
|
I think betting on low-power NPU hardware wasn't necessarily wrong - if you're Apple you're trying to optimise performance/watt across the system as a whole. So in a context where you're shipping first-party bespoke on-device ML features it can make sense to have a modestly sized dedicated accelerator. I'd say the biggest problem with the NPU is that you can only use it from Core ML. Even MLX can't access it it! As you say the big world-changing LLMs are scaling up, not down. At the same time (at least so far) LLM usage is intermittent - we want to consume thousands of tokens in seconds, but a couple of times a minute. That's a client-server timesharing model for as long as the compute and memory demand can't fit on a laptop. |
|