|
|
|
|
|
by tucnak
490 days ago
|
|
I don't know if they'd changed the submission title or what, but it says quite explicitly "Deepseek R1 Distill 8B Q40" which is a far-cry from "Deepseek R1" which would be misrepresenting the result, indeed. However, if you refer to Distilled Model Evaluation[1] section of the official R1 repository, you will note that DeepSeek-R1-Distill-Llama-8B is not half-bad; it supposedly out-performs both 4o-0513 and Sonnet-1022 on a handful of benchmarks. Remember sampling from formal grammar is a thing! This is relevant, because llama.cpp has GBNF, and lazy grammar[2] setting now, which is making it double not-half-bad for a handful of use-cases, less of all deployments like this. That is to say, the grammar kicks in after </think>. Not to mention, it's always subject to further fine-tuning: multiple vendors are now offering "RFT" services, i.e. enriching your normal SFT dataset with synthetic reasoning data from the big-boy R1 himself. For all intents and purposes, this result could be much more valuable prior than you're giving it credit for! 6 tok/s decoding is not much, but Raspberry Pi people don't care, lol. [1] https://github.com/deepseek-ai/DeepSeek-R1#distilled-model-e... [2] https://github.com/ggerganov/llama.cpp/pull/9639 |
|