| Have you tried running a reasonably sized model locally? You need minimum 24GB VRAM to load up a model. 32GB to be safe, and this isnt even frontier, but bare minimum. A good analogy would be streaming. To get good quality, sure, you can store the video file but it is going to take up space. For videos, these are 2-4GB (lets say) and streaming will always be easier and better. For models, we're looking at 100s of GB worth of model params. There's no way we can make it into, say, 1GB without loss in quality. So nope, beyond minimal classification and such, on-device isnt happening. -- EDIT: > Nobody wants to be sending EVERY request to someone else's cloud server. We do this already with streaming. You watch YouTube that is hosting videos on the "cloud". For latest MKBHD video, I dont care about having that locally (for the most part). I just wanna watch the video and be done with it. Same with LLMs. If LLMs are here to stay, most people would wanna use the latest / greatest models. --- EDIT-EDIT: If you response is Apple will figure it out somehow. Nope, Apple is sitting out the AI race. So it has no technology. It has nothing. It has access to whatever open source is available or something they can license from rest. So nope, Apple isnt pushing the limits. They are watching the world move beyond them. |