|
|
|
|
|
by germanjoey
996 days ago
|
|
Sambanova just launched something similar to what you're describing. It's a demo of their new chip running a 1T param MoE model 150 7B llama2s, each retrained to be an expert in a different topic. So one of them is a "law" expert, another on "physics", etc. They've got a video here [1] (scroll down slightly) that compares it against a 180B Falcon model that's running on GPUs on HuggingFace. The MoE results are not only just as good quality-wise, but also ridiculously fast. Like, nearly instant. A big benefit is that the experts can be swapped-out and retrained with new data, which is obviously not as easy with the more monolithic 180B model. [1] https://sambanova.ai/launch2023 |
|