Hacker News new | ask | show | jobs
by MacsHeadroom 1199 days ago
Yes. Starting with the Facebook versions of LLaMA-7B you just quantize the model to 4bit on your desktop (since it takes 14GB of RAM) and then move it to your phone and follow the Android instructions in the repo. https://github.com/ggerganov/llama.cpp/#android

I've seen dozens of screenshots of it running in termux on androids by now at completely usable speeds.

1 comments

Thank you for the link! Insane that this can run on a phone.

As my current potato computer has 8GB of RAM, I'll ask a friend to do it :-)