|
|
|
|
|
by hmottestad
806 days ago
|
|
Since it’s a MOE model it will only need to load a few of the 8 sub models into vram in order to answer a query. So it may look large, but I think a quantized model will easily fit on a Mac with 64GB of memory and maybe even a bit fewer bits and it’ll fit into 32GB. I think it might be the end for 24GB 4090 cards though :( |
|
But, still, its going to need 262GB for weights + a variable amount based on context without quantization, and 66GB+ at 4-bit quantization.