| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ttul 1022 days ago
	There are many MoE architectures and I suppose we don’t know for sure which OpenAI is using. The “selection” of the right mix of models is something that a network learns and it’s not a complex process. Certainly no more complex than training an LLM.

1 comments

axpy906 1022 days ago

When I wrote “backend” was a poor choice of a word. “Meta-model” is probably a better choice of wording.

I hope it did not detract too much from the point of focusing on subtasks and modalities for FOSS as GPT 4 was built on a $163 million budget.

Finally, good point. We’ve got no idea of what OpenAI’s MoE approach is and how it works. I went back to Metas 2022 NLLB-200 system paper and they didn’t even publish the exact details of the router (gate).

link

ttul 1021 days ago

Yeah, good point on the importance of FOSS focusing on subtasks... because FOSS isn't going to be spending $150M+ training a model any time soon without something like government backing.

link