| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mongrelion 95 days ago
	Which quantization are you running and what context size? 32tok/s for that model on that card sounds pretty good to me!