| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tgtweak 97 days ago
	Depends entirely on quantization. Q6_K with max context length (262144) is ~40GB of VRAM. Q8 with the same context wouldn't fit in 48GB of VRAM, it did with 128k of context.