Hacker News new | ask | show | jobs
by roosgit 772 days ago
About a year ago I bought some parts to build a Linux PC for testing LLMs with llama.cpp. I paid less than $200 for: a B550MH motherboard, AMD Ryzen 3 4100, 16GB DDR4, 256GB NVMe SSD. I already had an old PC case with a 350W PSU and a 256MB video card because the PC wouldn’t boot without one.

I looked today on Newegg and similar PC components would cost $220-230.

From a performance perspective, I get about 9 tokens/s from mistral-7b-instruct-v0.2.Q4_K_M.gguf with a 1024 context size. This is with overclocked RAM which added 15-20% more speed.

The Mac Mini is probably faster than this. However the custom built PC route gives you the option to add more RAM later on to try bigger models. It also lets you add a decent GPU. Something like a used 3060, as one of comments says.

1 comments

FYI for the Mac Mini idea, I have an M1 Macbook Pro with 32gb. There's some sort of limitation on how much ram can be allocated to the GPU. Trying to run even a 22gb ram model will fail. The best I've gotten is Code Llama 34B 3-bit at 18.8gb. There can be tons of RAM still empty but the LLM will just infinite loop dropping a chunk of RAM and reloading from disk.
Yes, Metal seems to allow a maximum of 1/2 of the RAM for one process, and 3/4 of the RAM allocated to the GPU overall. There’s a kernel hack to fix it, but that comes with the usual system integrity caveats. https://github.com/ggerganov/llama.cpp/discussions/2182