| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by DrBenCarson 37 days ago
	How are you using that RAM with the GPU?

1 comments

canpan 37 days ago

Llama.cpp with automatic offload to main memory. You can also use Ollama, it is easier, but slower.

link

reverius42 37 days ago

For those who want a GUI, LM Studio does this too (with llama.cpp as the backend I think). I'm getting great (albeit slow) results with Qwen3.6-35B MoE on 8GB GPU RAM, 40GB system RAM.

link