| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by avereveard 956 days ago
	with llama.cpp and a 12gb 3060 they can get the an entire mistral model at Q5_K_M n ram with the full 32k context. I recommend openhermes-2.5-mistral-7b-16k with USER: ASSISTANT: instructions, it's working surprisingly well for content production (let's say everything except logic and math, but that's not the strong suite of 7b models in general)