|
|
|
|
|
by BoorishBears
416 days ago
|
|
I do? I spend a ton of time post-training models for creative tasks. The effects of model quantization are usually qualified in terms of performance on benchmaxxed tasks with strong logit probabilities, temp 0, and a "right" answer the model has to pick. Or even worse they'll be measured on metrics that don't map to anything except themselves like perplexity (https://arxiv.org/pdf/2407.09141) I agree Q8 is strong but I also think the effects of quantization are constantly being underappreciated. People are often talking about how these models perform while fundamentally using 10+ variants of a single model with distinct performance profiles. Even knowing the bits per weight used isn't enough to know how exactly a given quant method is affecting the model: https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-ggufs |
|