|
|
|
|
|
by danielhanchen
326 days ago
|
|
Oh apologies I got confused - it's because when we calculate our dynamic quants, we have to do it on the fixed model! For example in Phi 3 for example, the end of sentence token was wrong - if we use this, then our quants would be calibrated incorrectly, since chatting with the model will use the actual correct token. Another is Llama 4 - https://github.com/ggml-org/llama.cpp/pull/12889 in which I fixed a RoPE issue - if we didn't fix it first, then again the calibration process would be incorrect. |
|