Hacker News new | ask | show | jobs
by singhrac 1041 days ago
This code runs Llama2 quantized and unquantized in a roughly minimal way: https://github.com/srush/llama2.rs (though extracting the quantized 70B weights takes a lot of RAM). I'm running the 13B quantized model on ~10-11GB of CPU memory.
1 comments

From what I gather, this is a Rust implementation that runs Llama2. Can it run any other models like the ones I'm having trouble finding info about?