Hacker News new | ask | show | jobs
by axpy906 1022 days ago
The key in that is models. Per the GPT4 leaked details, it’s not a a single model but 16 MOE mixture of experts. There’s probably quite a lot of complexity on the backend in sourcing the right model for the right query. In short, it’s probably better to focus on single models for specific tasks in the OS community as evidenced by Code Llama. Having a system like GPT4 is still difficult to replicate. Getting it to run on a consumer hardware for specific tasks like code gen at almost GPT4 level is doable.
2 comments

>There’s probably quite a lot of complexity on the backend in sourcing the right model for the right query.

This isn't how Sparse MoE models work. There isn't really any complexity like that. And different models will or can pick each token.

Sparse models aren't an ensemble of models.

There are many MoE architectures and I suppose we don’t know for sure which OpenAI is using. The “selection” of the right mix of models is something that a network learns and it’s not a complex process. Certainly no more complex than training an LLM.
When I wrote “backend” was a poor choice of a word. “Meta-model” is probably a better choice of wording.

I hope it did not detract too much from the point of focusing on subtasks and modalities for FOSS as GPT 4 was built on a $163 million budget.

Finally, good point. We’ve got no idea of what OpenAI’s MoE approach is and how it works. I went back to Metas 2022 NLLB-200 system paper and they didn’t even publish the exact details of the router (gate).

Yeah, good point on the importance of FOSS focusing on subtasks... because FOSS isn't going to be spending $150M+ training a model any time soon without something like government backing.