|
|
|
|
|
by MacsHeadroom
929 days ago
|
|
Ah good catch. Upon even closer examination, the attention layer (~2B params) is shared across experts. So in theory you would need 2B for the attention head + 5B for each of two experts in RAM. That's a total of 12B, meaning this should be able to be run on the same hardware as 13B models with some loading time between generations. |
|