What are those NLP tasks, if I may ask? (I was thinking above about using it as a chatbot like ChatGPT or Bard, which currently seems the only application for end-users.)
News summarization, news data extraction, news question answering, news filtering. I can assure you that older 7B/13B models had trouble following directions and outputting (for example) JSON.
I'm pretty sure Apple won't offer those things locally on iPhones. The hardware requirement is too high and the value to average Apple customers too small.
if you use commodity GPUs, sure. if you use TPUs (which Apple is already building into their chips) the efficiency improvements are massive. seriously look at some Coral Edge TPUs and what they can do at power levels completely unheard of for GPUs. then look at how much faster M1/M2 Macs are than normal desktop GPUs for machine learning tasks because they have an onboard accelerator
It's not just inference time, RAM size is another bottleneck. Apple, being Apple, probably wouldn't want to offer anything less than GPT-3.5 level of intelligence. Which I would estimate at 220 billion parameters (1/8 MoE GPT-4 rumor), which would require 220 GB RAM at 8 bit parameter quantization.
apple probably has the attention to detail to train the absolute shit out of their models. they will not need 8x220M parameters to do what GPT4 does, if they ever get to that point. see LLaMA2 7b and 13b being (subjectively) far better than LLaMA1 even with the same number of parameters, just by having been trained more
apple is known to care a lot about stuff like this. like, a lot. they are pedantic as heck