|
|
|
|
|
by axpy906
1022 days ago
|
|
The key in that is models. Per the GPT4 leaked details, it’s not a a single model but 16 MOE mixture of experts. There’s probably quite a lot of complexity on the backend in sourcing the right model for the right query. In short, it’s probably better to focus on single models for specific tasks in the OS community as evidenced by Code Llama. Having a system like GPT4 is still difficult to replicate. Getting it to run on a consumer hardware for specific tasks like code gen at almost GPT4 level is doable. |
|
This isn't how Sparse MoE models work. There isn't really any complexity like that. And different models will or can pick each token.
Sparse models aren't an ensemble of models.