|
|
|
|
|
by wgd
451 days ago
|
|
You can run 4-bit quantized version at a small (though nonzero) cost to output quality, so you would only need 16GB for that. Also it's entirely possible to run a model that doesn't fit in available GPU memory, it will just be slower. |
|