Hacker News new | ask | show | jobs
by rajnathani 807 days ago
Could you please elaborate?
1 comments

The original MoE research done by Google around LLMs involved nested transformers to scale them. It was a layered approach where at each layer you would have set of experts, generally routed to by simple heuristics, then each of those models would call into its own series of experts and combine the data in various ways.

These models were SOTA for their time

Interesting, but that isn't recursive as the sub-model cannot invoke a model higher up in the invoke graph/tree.