|
|
|
|
|
by ttul
1022 days ago
|
|
There are many MoE architectures and I suppose we don’t know for sure which OpenAI is using. The “selection” of the right mix of models is something that a network learns and it’s not a complex process. Certainly no more complex than training an LLM. |
|
I hope it did not detract too much from the point of focusing on subtasks and modalities for FOSS as GPT 4 was built on a $163 million budget.
Finally, good point. We’ve got no idea of what OpenAI’s MoE approach is and how it works. I went back to Metas 2022 NLLB-200 system paper and they didn’t even publish the exact details of the router (gate).