|
|
|
|
|
by danielhanchen
97 days ago
|
|
Oh https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks might be helpful - it provides benchmarks for Q4_K_XL vs Q4_K_M etc for disk space vs KL Divergence (proxy for how close to the original full precision model) Q4_0 and Q4_1 were supposed to provide faster inference, but tests showed it reduced accuracy by quite a bit, so they are deprecated now. Q4_K_M and UD-Q4_K_XL are the same, just _XL is slightly bigger than _M The naming convention is _XL > _L > _M > _S > _XS |
|
Do you think it's time for version numbers in filenames? Or at least a sha256sum of the merged files when they're big enough to require splitting?
Even with gigabit fiber, it still takes a long time to download model files, and I usually merge split files and toss the parts when I'm done. So by the time I have a full model, I've often lost track of exactly when I downloaded it, so I can't tell whether I have the latest. For non-split models, I can compare the sha256sum on HF, but not for split ones I've already merged. That's why I think we could use version numbers.