| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kkielhofner 936 days ago

Strongly agree.

Local, app embedded, and purpose-built targeted experts is clearly the future in my mind for a variety of reasons. Looking at TPUs in Android devices and neural engine in Apple hardware it's pretty clear.

Xcode already has an ML studio, for example, that can not only embed and integrate models in apps but also finetune, etc. It's obvious to me that at some point most apps will have embedded models in the app (or device) for specific purposes.

No AI can compare to humans and even we specialize. You wouldn't hire a plumber to perform brain surgery and you wouldn't hire a neurosurgeon to fix your toilet. Mixture of experts with AI models is a thing of course but when we look at how we primarily interact with technology and the functionality it provides it's generally pretty well siloed to specific purposes.

A purposed domain and context trained/tuned small model doing stuff on your on-device data would likely do nearly as well if not better for some applications than even ChatGPT. Think of the next version of device keyboards doing RAG+LLM through your text messages to generate replies. Stack it up with speech to text, vision, multimodal models, and who knows what and yeah, interesting.

Throw in the automatic scaling, latency, and privacy and the wins really stack up.

Some random app developer can integrate a model in their application and scale higher with better performance than ChatGPT without setting money on fire.

2 comments

jorvi 935 days ago

> Local, app embedded, and purpose-built targeted experts is clearly the future in my mind for a variety of reasons. Looking at TPUs in Android devices and neural engine in Apple hardware it's pretty clear.

I think that’s only true for delay-intolerant or privacy-focused features. For most situations, a remote model running on an external server will outperform a local model. There is no thermal, battery or memory headroom for the local model to ever do better. The cost being a mere hundred milliseconds delay at most.

I expect most models triggered on consumer devices to run remotely, with a degraded local service option in case of connection problems.

link

kkielhofner 935 days ago

Snapchat filters, iPhone photo processing/speech to text/always-on Hey Siri/OCR/object detection and segmentation - there are countless applications and functionality doing this on device today (and for years). For something like the RAG approach I mentioned the sync and coordination of your local content to a remote API would be more taxing on the battery just in terms of the radio than what we already see from on device neural engines and TPUs as leveraged by the functionality I described.

These applications would also likely be very upload heavy (photo/video inference - massive upload, tiny JSON response) which could very likely end up taxing cell networks further. Even RAG is thousands of tokens in and a few hundred out (in most cases).

There's also the issue of Nvidia GPUs having > 1 yr lead times and the exhaustion of GPUs available from various cloud providers. LLMs especially use tremendous resources for training and this increase is leading to more and more contention for available GPU resources. People are going to be looking more and more to save the clouds and big GPUs for what you really need to do there - big training.

Plus, not everyone can burn $1m/day like ChatGPT.

If AI keeps expanding and eating more and more functionality the remote-first approach just isn't sustainable.

There will likely always be some sort of blend (with serious heavy lifting being cloud, of course) but it's going to shift more and more to local and on-device. There's just no other way.

link

jorvi 935 days ago

> Snapchat filters, iPhone photo processing/speech to text/always-on Hey Siri/OCR/object detection and segmentation - there are countless applications and functionality doing this on device today (and for years)

But those are peanuts compared to what will be possible in the (near) future. You think content-aware fill is neat? Wait until you can zoom out of a photo 50% or completely change the angle.

That’ll costs gobs of processing power and thus time and battery, much more than a 20MB burst transfer of a photo and the backsynced modifications.

> If AI keeps expanding and eating more and more functionality the remote-first approach just isn't sustainable.

It’ll definitely create a large moat around companies with lots of money or extremely efficient proprietary models.

link

kkielhofner 935 days ago

> That’ll costs gobs of processing power and thus time and battery

The exact same thing was said about the functionality we're describing yet there it is. Imagine describing that to someone in 2010 who's already complaining about iPhone battery life. The response would be carbon-copy to yours.

In five years from the iPhone 8 to the iPhone 14 TOPS on the neural engine went from 0.6 to 17[0]. The iPhone 15 more than doubled that and stands at 35 TOPS[1]. Battery life is better than ever and that's a 58x gain just in neural, not even GPU, CPU, performance cores, etc.

Over that same period of time Nvidia GPUs only increased about 9x[2] - they're pushing the fundamentals much harder as a law of large numbers-ish issue.

So yeah, I won't have to wait long for zoom out of a photo 50%, completely change the angle, or who knows what else to be done locally. In fact, for these use cases increasingly advanced optics, processing, outside visual range sensors, etc, etc makes my point even more - even more data going to the cloud when the device is best suited to be doing it anyway.

Look at it this way - Apple sold over 97 million iPhones in 2023. Assuming the lower averages that's 1,649,000,000 combined TOPS out there.

Cloud providers benefit from optimization and inherent oversubscription but by comparison Nvidia sold somewhere around 500,000,000 TFLOPS worth of H100s last year.

Mainframe and serial terminal to desktop to thin client and terminal server - around and around we go.

[0] - https://appleinsider.com/articles/22/09/26/how-iphone-speeds...

[1] - https://www.counterpointresearch.com/insights/iphone-15-usb-...

[2] - https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_proces...

link

spyUlovedM3 935 days ago

> when we look at how we primarily interact with technology and the functionality it provides it's generally pretty well siloed to specific purposes.

Yes, but siloes in this case will get much bigger e.g. ChatGPT vs DALL-E

link