| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wgd 451 days ago
	You can run 4-bit quantized version at a small (though nonzero) cost to output quality, so you would only need 16GB for that. Also it's entirely possible to run a model that doesn't fit in available GPU memory, it will just be slower.