|
|
|
|
|
by archon
5 hours ago
|
|
I'm uneducated on how distillation works at more than a basic level so forgive me if this is a stupid question. Isn't "distillation" of another provider's model exactly how these models got training date in the first place: Massive amounts of the written word + Prompt -> Answer. Why wouldn't distillation produce similar "reasoning" in the new model? It's just inputs and outputs. |
|
The intuition is that distillation exploits not only the "right" answer but the relationship between answers (what's the second most right answer? the third? etc).