Y
Hacker News
new
|
ask
|
show
|
jobs
by
snowycat
1073 days ago
I am running 30b llama models (4 bit quantized using llama.cpp) on 32 gb of ram and no GPU. I get around 2 tokens/second.