| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by roosgit 652 days ago

I have a separate PC that I access through SSH. I recently bought a GPU for it, before that I was running it on CPU alone.

- B550MH motherboard

- Ryzen 3 4100 CPU

- 32GB (2x16) RAM cranked up to 3200MHz (prompt generation in memory bound)

- 256GB M.2 NVMe (helps with loading models faster)

- Nvidia 3060 12GB

Software-wise, I use llamafile because on the CPU it's faster by 10-20% for prompt processing than llama.cpp.

Performance "Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf":

CPU-only: 23.47 t/s (processing), 8.73 t/s (generation)

GPU: 941.5 t/s (processing), 29.4 t/s (generation)