Hacker News new | ask | show | jobs
by wgd 451 days ago
You can run 4-bit quantized version at a small (though nonzero) cost to output quality, so you would only need 16GB for that.

Also it's entirely possible to run a model that doesn't fit in available GPU memory, it will just be slower.