| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mountainriver 799 days ago
	The original MoE research done by Google around LLMs involved nested transformers to scale them. It was a layered approach where at each layer you would have set of experts, generally routed to by simple heuristics, then each of those models would call into its own series of experts and combine the data in various ways. These models were SOTA for their time

1 comments

Interesting, but that isn't recursive as the sub-model cannot invoke a model higher up in the invoke graph/tree.