| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by am17an 137 days ago
	You can still run larger MoE models using expert weight off-loading to the CPU for token generation. They are by and large useable, I get ~50 toks/second on a kimi linear 48B (3B active) model on a potato PC + a 3090