Y
Hacker News
new
|
ask
|
show
|
jobs
by
DrBenCarson
37 days ago
How are you using that RAM with the GPU?
1 comments
canpan
37 days ago
Llama.cpp with automatic offload to main memory. You can also use Ollama, it is easier, but slower.
link
reverius42
37 days ago
For those who want a GUI, LM Studio does this too (with llama.cpp as the backend I think). I'm getting great (albeit slow) results with Qwen3.6-35B MoE on 8GB GPU RAM, 40GB system RAM.
link