| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bildung 60 days ago
	I currently run the qwen3.5-122B (Q4) on a Strix Halo (Bosgame M5) and am pretty happy with it. Obviously much slower than hosted models. I get ~ 20t/s with empty context and am down to about 14t/s with 100k of context filled. No tuning at all, just apt install rocm and rebuilding llama.cpp every week or so.