Hacker News new | ask | show | jobs
by snowycat 1073 days ago
I am running 30b llama models (4 bit quantized using llama.cpp) on 32 gb of ram and no GPU. I get around 2 tokens/second.