| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by regularfry 110 days ago
	I've got the unsloth q4_K_XL 35b running in llama.cpp on an i9/64G/4090 machine doing double-digit tokens per second with a 90k+ token context window available. The model's completely in VRAM.