Hacker News new | ask | show | jobs
by mountainriver 807 days ago
Most of the original MoE implementations around LLMs were in fact recursive
1 comments

Could you please elaborate?
The original MoE research done by Google around LLMs involved nested transformers to scale them. It was a layered approach where at each layer you would have set of experts, generally routed to by simple heuristics, then each of those models would call into its own series of experts and combine the data in various ways.

These models were SOTA for their time

Interesting, but that isn't recursive as the sub-model cannot invoke a model higher up in the invoke graph/tree.