Hacker News new | ask | show | jobs
by amitprasad 902 days ago
Also relevant: LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Apple seems to be gearing up for significant advances in on-device inference using this LLMs

https://arxiv.org/abs/2312.11514