|
|
|
|
|
by joe_the_user
502 days ago
|
|
The idea that they used o1's outputs for their distillation further shows that models like o1 are necessary. Hmm, I think the narrative of the rise of LLMs is that once the output of humans has been distilled by the model, the human isn't necessary. As far as I know, DeepSeek adds only a little to the transformers model while o1/o3 added a special "reasoning component" - if DeepSeek is as good as o1/o3, even taking data from it, then it seems the reasoning component isn't needed. |
|
Distillation is a term of art in AI and it is fundamentally incorrect to talk about distilling human-created data. Only an AI model can be distilled.
https://en.m.wikipedia.org/wiki/Knowledge_distillation#Metho...