|
|
|
|
|
by roosgit
605 days ago
|
|
I have a separate PC that I access through SSH. I recently bought a GPU for it, before that I was running it on CPU alone. - B550MH motherboard - Ryzen 3 4100 CPU - 32GB (2x16) RAM cranked up to 3200MHz (prompt generation in memory bound) - 256GB M.2 NVMe (helps with loading models faster) - Nvidia 3060 12GB Software-wise, I use llamafile because on the CPU it's faster by 10-20% for prompt processing than llama.cpp. Performance "Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf": CPU-only: 23.47 t/s (processing), 8.73 t/s (generation) GPU: 941.5 t/s (processing), 29.4 t/s (generation) |
|