|
|
|
|
|
by dragonwriter
929 days ago
|
|
> Would this allow you to run each expert on a cheap commodity GPU card so that instead of using expensive 200GB cards we can use a computer with 8 cheap gaming cards in it? I would think no differently than you can run a large regular model on a multiGPU setup (which people do!). Its still all one network even if not all of it is activated for each token, and since its much smaller than a 56B model, it seems like there are significant components of the network that are shared. |
|