|
|
|
|
|
by Aurornis
69 days ago
|
|
Additional VRAM is needed for context. This model is a MoE model with only 3B active parameters per expert which works well with partial CPU offload. So in practice you can run the -A(N)B models on systems that have a little less VRAM than you need. The more you offload to the CPU the slower it becomes though. |
|
Or is it only layers but that would affect all Experts?