| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by endymi0n 912 days ago
	There’s a brand new hybrid quantization for Mixtral out that uses 4b for shared neurons and 2b for experts, which does not bleed much perplexity, but fits it into a 32G machine. Haven’t had it in hand yet and no link here on mobile, but can’t wait to try.