| Strongly agree. Local, app embedded, and purpose-built targeted experts is clearly the future in my mind for a variety of reasons. Looking at TPUs in Android devices and neural engine in Apple hardware it's pretty clear. Xcode already has an ML studio, for example, that can not only embed and integrate models in apps but also finetune, etc. It's obvious to me that at some point most apps will have embedded models in the app (or device) for specific purposes. No AI can compare to humans and even we specialize. You wouldn't hire a plumber to perform brain surgery and you wouldn't hire a neurosurgeon to fix your toilet. Mixture of experts with AI models is a thing of course but when we look at how we primarily interact with technology and the functionality it provides it's generally pretty well siloed to specific purposes. A purposed domain and context trained/tuned small model doing stuff on your on-device data would likely do nearly as well if not better for some applications than even ChatGPT. Think of the next version of device keyboards doing RAG+LLM through your text messages to generate replies. Stack it up with speech to text, vision, multimodal models, and who knows what and yeah, interesting. Throw in the automatic scaling, latency, and privacy and the wins really stack up. Some random app developer can integrate a model in their application and scale higher with better performance than ChatGPT without setting money on fire. |
I think that’s only true for delay-intolerant or privacy-focused features. For most situations, a remote model running on an external server will outperform a local model. There is no thermal, battery or memory headroom for the local model to ever do better. The cost being a mere hundred milliseconds delay at most.
I expect most models triggered on consumer devices to run remotely, with a degraded local service option in case of connection problems.