| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nicohayes 263 days ago
	Could you clarify whether the 2B active parameter concept refers to per-token inference and how this scales with context length? Specifically how MoE affects activation during inference and any practical implications for latency.