| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by canpan 82 days ago
	Llama.cpp with automatic offload to main memory. You can also use Ollama, it is easier, but slower.

1 comments

reverius42 82 days ago

For those who want a GUI, LM Studio does this too (with llama.cpp as the backend I think). I'm getting great (albeit slow) results with Qwen3.6-35B MoE on 8GB GPU RAM, 40GB system RAM.

link