| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by singhrac 1041 days ago
	This code runs Llama2 quantized and unquantized in a roughly minimal way: https://github.com/srush/llama2.rs (though extracting the quantized 70B weights takes a lot of RAM). I'm running the 13B quantized model on ~10-11GB of CPU memory.

1 comments

From what I gather, this is a Rust implementation that runs Llama2. Can it run any other models like the ones I'm having trouble finding info about?