|
|
|
|
|
by tenpa0000
116 days ago
|
|
I run Llama 3.2 3B locally for latency-sensitive classification (sub-50ms, so no room for bigger models). At that scale Q2_K vs Q4_K_M isn't just smaller — Q2 starts flipping yes/no answers that Q4 gets right. Not often, but enough to notice in production. So the KL divergence numbers here are more useful to me than the MMLU tables honestly. I've had MMLU hold steady while the output distribution drifted enough to break things downstream. Does the calibration dataset make much difference at 3B though? There's so little redundancy that I'd expect it to hit a floor pretty fast regardless of how good the calibration data is. |
|