| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by axpy906 1022 days ago
	The key in that is models. Per the GPT4 leaked details, it’s not a a single model but 16 MOE mixture of experts. There’s probably quite a lot of complexity on the backend in sourcing the right model for the right query. In short, it’s probably better to focus on single models for specific tasks in the OS community as evidenced by Code Llama. Having a system like GPT4 is still difficult to replicate. Getting it to run on a consumer hardware for specific tasks like code gen at almost GPT4 level is doable.

2 comments

famouswaffles 1022 days ago

>There’s probably quite a lot of complexity on the backend in sourcing the right model for the right query.

This isn't how Sparse MoE models work. There isn't really any complexity like that. And different models will or can pick each token.

Sparse models aren't an ensemble of models.

link

ttul 1022 days ago

There are many MoE architectures and I suppose we don’t know for sure which OpenAI is using. The “selection” of the right mix of models is something that a network learns and it’s not a complex process. Certainly no more complex than training an LLM.

link

axpy906 1022 days ago

When I wrote “backend” was a poor choice of a word. “Meta-model” is probably a better choice of wording.

I hope it did not detract too much from the point of focusing on subtasks and modalities for FOSS as GPT 4 was built on a $163 million budget.

Finally, good point. We’ve got no idea of what OpenAI’s MoE approach is and how it works. I went back to Metas 2022 NLLB-200 system paper and they didn’t even publish the exact details of the router (gate).

link

ttul 1021 days ago

Yeah, good point on the importance of FOSS focusing on subtasks... because FOSS isn't going to be spending $150M+ training a model any time soon without something like government backing.

link