| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mountainriver 807 days ago
	Most of the original MoE implementations around LLMs were in fact recursive

1 comments

rajnathani 807 days ago

Could you please elaborate?

link

mountainriver 798 days ago

The original MoE research done by Google around LLMs involved nested transformers to scale them. It was a layered approach where at each layer you would have set of experts, generally routed to by simple heuristics, then each of those models would call into its own series of experts and combine the data in various ways.

These models were SOTA for their time

link

rajnathani 798 days ago

Interesting, but that isn't recursive as the sub-model cannot invoke a model higher up in the invoke graph/tree.

link