|
|
|
|
|
by mwcampbell
140 days ago
|
|
Given that it's a 400B-parameter model, but it's a sparse MoE model with 13B active parameters per token, would it run well on an NVIDIA DGX Spark with 128 GB of unified RAM, or do you practically need to hold the full model in RAM even with sparse MoE? |
|
That said, there are folks out there doing it. https://github.com/lyogavin/airllm is one example.