|
|
|
|
|
by DougBTX
875 days ago
|
|
Nice graphs here: https://github.com/ggerganov/llama.cpp/pull/1684 So for example, 2 bit version of the 30B is much worse than the original, but still better than the 13B model. Also, there are lots of extra details, eg, not all of the weights are 2 bit, and even the 2 bit weights are higher than that overall as groups of quantised weights share scale factors stored elsewhere. |
|