| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Ms-J 1044 days ago
	I very much do appreciate your comment and will look into into llama.cpp. Was it from here: https://github.com/ggerganov/llama.cpp Do you have a guide that you followed and could link it to me or was it just from prior knowledge? Also, do you know if I could run the Wizard Vicuna on it? That model isn't listed on the above page.

3 comments

hdjfkfbfbr 1043 days ago

Glad to be of help. Yea that is the repo.

https://replicate.com/blog/run-llama-locally

I found that guide here on hn.

I run it cpu only with 16 threads but yeah perf is good enough.

BTw my 6gb figure is me.measuring from htop so llama2 is likely less.

link

Ms-J 1043 days ago

Thanks for the starting point. I'll give an update if I'm able to successfully run the other models. I hope it could help the community.

link

singhrac 1041 days ago

This code runs Llama2 quantized and unquantized in a roughly minimal way: https://github.com/srush/llama2.rs (though extracting the quantized 70B weights takes a lot of RAM). I'm running the 13B quantized model on ~10-11GB of CPU memory.

link

Ms-J 1036 days ago

From what I gather, this is a Rust implementation that runs Llama2. Can it run any other models like the ones I'm having trouble finding info about?

link

hdjfkfbfbr 1043 days ago

Not sure about vicuna myself

link