| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jwitthuhn 806 days ago
	It is ~260GB with presumably fp16 weights. Should fit into 64GB at 3-bit quantization (~49GB). Edit: To add to this, I've had good luck getting solid output out of mixtral 8x7b at 3-bit, so that isn't small enough to completely kill the model's quality.

1 comments

I wonder, can you quantize it yourself with some tool?

llama.cpp can quantize a model for you:

Thanks!!