|
|
|
|
|
by rashidujang
1054 days ago
|
|
Hey there, I was confused at this exact question too. This link might help, written by a contributor to llama.cpp:
https://github.com/ggerganov/llama.cpp/pull/1684 TLDR: Lower quantization means higher perplexity (i.e. how 'confused' the model is when seeing new information). It's a matter of testing it out and choosing a model that fits your available memory. The higher the quantization number, the better (generally). |
|