|
|
|
|
|
by russianGuy83829
734 days ago
|
|
It seems like this can’t run all models, and needs custom ones trained from scratch: “ We introduce two new models: TurboSparse-Mistral-7B and TurboSparse-Mixtral-47B. These models are sparsified versions of Mistral and Mixtral […]. Notbly, our models are trained with just 150B tokens within just 0.1M dollars”. It remains to be seen how good these custom models are. |
|
https://arxiv.org/abs/2406.05955