Hacker News new | ask | show | jobs
by rashidujang 1054 days ago
Hey there, I was confused at this exact question too. This link might help, written by a contributor to llama.cpp: https://github.com/ggerganov/llama.cpp/pull/1684

TLDR: Lower quantization means higher perplexity (i.e. how 'confused' the model is when seeing new information). It's a matter of testing it out and choosing a model that fits your available memory. The higher the quantization number, the better (generally).