|
|
|
|
|
by byefruit
424 days ago
|
|
> Both of our models are trained on top of DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Qwen-32B. Not to take away from their work but this shouldn't be buried at the bottom of the page - there's a gulf between completely new models and fine-tuning. |
|
If they needed to assign their own name to it, at least they could have included the parent (and grant parent) model names in the name.
Just like the name DeepSeek-R1-Distill-Qwen-7B clearly says that it is a distilled Qwen model.