| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rfdearborn 804 days ago
	The trendline is definitely toward increasing dynamic routing, but I suspect it's more so that MoE/MoD/MoDE enable models to embed additional facts with less superposition within their weights than enable deeper reasoning. Instead I expect deeper reasoning will come through token-wise dynamism rather than layer-wise -- e.g., this recent Quiet-STaR paper in which the model outputs throwaway rationale tokens: https://arxiv.org/abs/2403.09629