|
|
|
|
|
by refibrillator
383 days ago
|
|
> Unsloth Dynamic GGUF which, quality wise in real-world use performs very close to the original How close are we talking? I’m not calling you a liar OP, but in general I wish people perpetuating such broad claims would be more rigorous. Unsloth does amazing work, however as far as I’m aware even they themselves do not publish head to head evals with the original unquantized models. I have sympathy here because very few people and companies can afford to run the original models, let alone engineer rigorous evals. However I felt compelled to comment because my experience does not match. For relatively simple usage the differences are hard to notice, but they become much more apparent in high complexity and long context tasks. |
|
For R1 specifically, we did an internal benchmark on the original model - https://unsloth.ai/blog/deepseekr1-dynamic
For R1-0528 specifically on evals - we're still running them :)) It's quite expensive to run, so we first do "vibe check" on some internal test cases, and they do pretty well!
But we generally stress the bug fixes that we do, which objectively increase performance by +1 to sometimes +10% accuracy - for example Llama 4 bug fixes, Gemma bug fixes - https://news.ycombinator.com/item?id=39671146 etc are much more important :)
We also provide Q8_0 and Q8_K_XL quants, which are mostly equivalent to FP8 - you can also use the magical `-ot ".ffn_.*_exps.=CPU"` incantation to offload MoE layers to RAM!