|
|
|
|
|
by randomifcpfan
503 days ago
|
|
In my application, code generation, the distilled DeepSeek models (7B to 70B) perform poorly. They imitate the reasoning of the r1 model, but their conclusions are not correct. The real r1 model is great, better than o1, but the distilled models are not even as good as the base models that they were distilled from. |
|