Hacker News new | ask | show | jobs
by jocaal 1182 days ago
You realize that these language models are like 100's of GBs in size and consumes 10's of GB's of memory. Last time I checked, apple still ships their products with less than the market average in both of these specs. If you want a local running LLM on an iphone, get ready to sell a kidney.
2 comments

You can today run an LLM vastly better than Siri on a few GB of RAM using Llama 7B at 4-bit quantization and alpaca.cpp. This is moving so fast, every day there is something new coming. There won't be any moat in LLMs soon or even in dedicated HW as it turns out you don't need that much for "basic intelligence".

Note I'm not suggesting you can pack the full knowledgebase of humanity into those 2GB of RAM, but the key feature of an edge AI is simply to understand instructions, something Siri and Ok Google struggle with at best..

(assuming we're not talking about the near future)

I think this can be a scenario of converging incentives: on one side large models will incentivized hardware manufacturers to increase the memory available on the devices, while on the other sides model developers will be incentivized to trim the fat on the models and devise compression mechanisms that don't compromise quality too much.

It's not unthinkable to imagine a hand held device able to run full inference locally a few device generations in the future.