| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by int_19h 1088 days ago
	llama-30B (which is actually 33B) and derivatives generally run fine with 4-bit quantization on a single RTX 3090 or 4090, although depending on group size used for quantization you may need to slightly dial down the context size.