Yep, but Apple products don’t spend most of their time running huge models. They are running lots of little ones all the time, using hardware designed for that.
It seems that you're agreeing with what I wrote above. They ship a general-purpose stock system and tailor their compute offering towards that. Accelerating 'lots of little models' fits naturally into what they offer, in a way that a more compute-intensive design might not.
Yep, I misunderstood your point. Thanks for your patience. In my defense, the 'general purpose system' has a lot of model-inference-specific hardware. But not LLM-specific hardware.
If there's an M5 Ultra it'll be interesting to see what they've optimized it for.