|
|
|
|
|
by rfdearborn
804 days ago
|
|
The trendline is definitely toward increasing dynamic routing, but I suspect it's more so that MoE/MoD/MoDE enable models to embed additional facts with less superposition within their weights than enable deeper reasoning. Instead I expect deeper reasoning will come through token-wise dynamism rather than layer-wise -- e.g., this recent Quiet-STaR paper in which the model outputs throwaway rationale tokens: https://arxiv.org/abs/2403.09629 |
|