| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rashidujang 1101 days ago
	Hey there, I was confused at this exact question too. This link might help, written by a contributor to llama.cpp: https://github.com/ggerganov/llama.cpp/pull/1684 TLDR: Lower quantization means higher perplexity (i.e. how 'confused' the model is when seeing new information). It's a matter of testing it out and choosing a model that fits your available memory. The higher the quantization number, the better (generally).