Hacker News new | ask | show | jobs
by icosahedron 1201 days ago
I followed the initial instructions and the 7B model worked just fine.

I tried the supplementary instructions to download some of the models (7B, 13B, and 30B), and it didn't seem to work. The prompt returned nothing after waiting for several minutes.

Is there a way to run just one of the larger models?

2 comments

I am going to test this out today and roll this out as soon as I can, hopefully tomorrow. stay tuned.
What's the minimum spec GPU required? NVIDIA only? Any differences between Debian and Fedora Linuxes? RAM required?
This app is CPU only and gets good speeds on even mobile phone CPUs. Minimum RAM required is 5GB.
Oh wow, any way to do this on Android yet? That would be fun to tinker with, even if it's just the smaller model. Even my older Note 9 has 6GB.
Yes. Starting with the Facebook versions of LLaMA-7B you just quantize the model to 4bit on your desktop (since it takes 14GB of RAM) and then move it to your phone and follow the Android instructions in the repo. https://github.com/ggerganov/llama.cpp/#android

I've seen dozens of screenshots of it running in termux on androids by now at completely usable speeds.

Thank you for the link! Insane that this can run on a phone.

As my current potato computer has 8GB of RAM, I'll ask a friend to do it :-)

What distro and PC specs do you have success with?
I ran this on my intel i7-7700k with 32 gig ram. It ran very slow. Almost 1 word per second slow. Not sure if I did something wrong. Distro Ubuntu 22.04