|
|
|
|
|
by PhunkyPhil
36 days ago
|
|
> In distillation, you take a set of prompts you are interested in, and record the big LLM's outputs, then train your small model to produce the same output as the big LLM. Why use the bigger LLM outputs for this and not human outputs? If we assume that human responses to prompts are better than sota models (in some cases they are) then why use the big model at all? |
|
You can set up model distillation as a weekend batch job.